DRL-Based V2V Computation Offloading For Blockchain-Enabled Vehicular Networks
DRL-Based V2V Computation Offloading For Blockchain-Enabled Vehicular Networks
7, JULY 2023
Abstract—Vehicular edge computing (VEC) is an effective method to increase the computing capability of vehicles, where vehicles
share their idle computing resources with each other. However, due to the high mobility of vehicles, it is challenging to design an optimal
task allocation policy that adapts to the dynamic vehicular environment. Further, vehicular computation offloading often occurs between
unfamiliar vehicles, how to motivate vehicles to share their computing resources while guaranteeing the reliability of resource allocation
in task offloading is one main challenge. In this paper, we propose a blockchain-enabled VEC framework to ensure the reliability and
efficiency of vehicle-to-vehicle (V2V) task offloading. Specifically, we develop a deep reinforcement learning (DRL)-based computation
offloading scheme for the smart contract of blockchain, where task vehicles can offload part of computation-intensive tasks to
neighboring vehicles. To ensure the security and reliability in task offloading, we evaluate the reliability of vehicles in resource allocation
by blockchain. Moreover, we propose an enhanced consensus algorithm based on practical Byzantine fault tolerance (PBFT), and
design a consensus nodes selection algorithm to improve the efficiency of consensus and motivate base stations to improve reliability
in task allocation. Simulation results validate the effectiveness of our proposed scheme for blockchain-enabled VEC.
Index Terms—Vehicular edge computing (VEC), computation offloading, blockchain, deep reinforcement learning (DRL)
decentralized, transparent and immutable manner [11], [12]. utilized to motivate vehicles to contribute their idle
Moreover, blockchain relies on data transmitted by the P2P computing resources and improve the efficiency and
network layer while vehicular edge computing (VEC) para- reliability of V2V task offloading. Furthermore, to
digm originates from P2P but extends the peer to vehicles, make the proposed scheme adapt to the dynamic
thus blockchain and VEC have the same distributed network vehicular environment, the vehicular computing capa-
framework [13], [14]. Besides, blockchain can be regarded as bility, the V2V link state, and the reliability of service
a distributed database, and blockchain consensus requires vehicles are all considered in task allocation.
massive computation, while in VEC, vehicles can access Since the reliability of service vehicles is evaluated
edge servers for storage and computing services. Therefore, by previous task offloading events recorded in the
blockchain and VEC have the same functions of storage and transactions, and the accuracy of the estimated reli-
computation. It is worth noting that in a public blockchain, ability depends on the timeliness of the transactions,
all distributed nodes need to participate in the consensus we design a PBFT-based consensus, and develop a
process, and it takes a long time in block generation and veri- DRL-based algorithm to improve the efficiency of
fication, which is not suitable for high dynamic vehicular transaction verification by jointly optimizing consen-
networks. As a result, some works [15], [16] employed con- sus nodes selection and block size, and meanwhile
sortium blockchain in the VEC system, where only a part of motivate BSs to improve reliability in task allocation.
BSs are authorized to participate in the consensus process of The remainder of the paper is organized as follows. The
blockchain. To improve the efficiency of consensus, some related works are presented in Section 2. In Section 3, we
works [17], [18] proposed non-compute-intensive algorithms present the architecture of the blockchain-enabled VEC. Sec-
based on practical Byzantine fault tolerance (PBFT), which tion 4 describes the V2V computation offloading algorithm
are suitable for lightweight IoV-oriented blockchains. in smart contract. The consensus nodes selection in the block-
Furthermore, there are still some challenges to be solved chain is detailed in Section 5. Finally, we present the simula-
in blockchain-enabled VEC. First, due to the dynamic vehic- tion results and conclusion in Sections 6 and 7, respectively.
ular environment, it is challenging to establish an efficient
mechanism for vehicular task offloading with the consider-
2 RELATED WORK
ation of vehicle’s mobility and heterogeneous vehicular
computing capability. Second, how to motivate vehicles to 2.1 Vehicular Edge Computing
contribute their onboard computing resources and mean- The problem of vehicular task offloading in VEC has been
while guarantee the reliability of onboard resource alloca- widely investigated in recent works, and some optimization
tion is also a problem that should be carefully investigated. methods have been utilized to solve the problem. In [19],
Third, the performance of PBFT-based consensus protocol the task offloading problem was solved by a modified Ant
in consortium blockchain depends on the number of con- Colony Optimization algorithm. In [5], the problem of com-
sensus nodes and the block size. Therefore, how to design a putation offloading in VEC was formulated as an auction-
scheme that jointly optimizes the consensus nodes selection based graph job allocation problem and solved by a struc-
and block size with the aim of improving the performance ture-preserved matching algorithm. To minimize the overall
of blockchain is also a challenge. response time of offloading tasks with interdependency, a
Motivated by the challenges mentioned above, we design modified genetic algorithm-based task scheduling scheme
a VEC framework based on a lightweight blockchain, which that considered the instability and heterogeneity of VEC
makes V2V computation offloading performed in a secure, was developed in [20]. Moreover, some learning-based
reliable, and efficient manner. Besides, considering the methods were also proposed to solve the problem of vehicu-
dynamic vehicular environment, the policy of vehicular lar task offloading. In [21], a DRL-based resource allocation
task offloading should adapt to the state of the vehicular framework with multi-timescale was proposed by consider-
network, we thus utilize deep reinforcement learning (DRL) ing the constraints of vehicle mobility and task delay in
in vehicular task allocation, where the policy of task offload- VEC. In [1], authors designed an imitation learning-enabled
ing is updated in real time by observing the state of the online task scheduling algorithm for VEC. Since executing
vehicular network. The main contributions of this paper are offloading tasks occupies a vehicle’s onboard computing
summarized as follows: resources and consumes the vehicle’s energy, some vehicles
may not be willing to execute offloading tasks. To motivate
We propose a blockchain-enabled VEC framework, vehicles to contribute their idle computing resources, some
where BSs are utilized to maintain a consortium incentive mechanisms were proposed in recent works. For
blockchain. The blockchain is responsible for evalu- example, authors in [22] designed a contract-based incen-
ating the reliability of vehicles according to historical tive mechanism that combined resource contribution and
transactions and further motivating vehicles to make resource utilization, while in [23], a timeliness-aware incen-
appropriate resource allocation. Moreover, the smart tive mechanism was proposed to stimulate the participation
contract in blockchain facilitates the sharing of vehic- of vehicles with the consideration of the vehicle’s uncertain
ular computing resources on demand by automati- travel time.
cally running the vehicular computation offloading All of the works mentioned above treated the service pro-
algorithm, and guarantees the security and reliability viders as trustworthy devices and did not consider the secu-
of task offloading. rity and privacy of task offloading in VEC. In this work, we
We develop a smart contract-based vehicular task allo- exploit blockchain technology to guarantee security and
cation scheme, where dynamic pricing and DRL are reliability in vehicular computation offloading.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3884 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023
3 BLOCKCHAIN-ENABLED VEC
In this section, we will demonstrate the architecture of our
proposed blockchain-enabled VEC system in Sections 3.1
and 3.2, and further describe our proposed smart contract
for vehicular computation offloading and the consensus
scheme for the blockchain-enabled VEC in Sections 3.3 and
3.4, respectively. For ease of reference, the key notations in
this paper are summarized in Table 1.
Since the position of BSs is stationary and BSs serve traceability of V2V computation offloading in blockchain
nearby vehicles all the time, BSs can be used to collect vehic- makes it possible for guaranteeing the security and reliability
ular computing service requests, observe the states of the of vehicles and BSs. Specifically, when a BS performs the
vehicles inside the communication range, and allocate vehic- vehicular task allocation, it first collects the information of
ular computing tasks to vehicles with idle computing resour- vehicles in the communication range of the BS and finds the
ces. If a vehicle needs to offload part of its computing tasks reliability of the vehicles by looking up the table of reliability
due to the limited onboard computing resources, it first values stored in the BS with vehicle IDs, and then selects the
sends a service request to the nearby BS, and we call this service vehicle and service price by a DRL-based task alloca-
vehicle as the task vehicle, and call the vehicles in the com- tion algorithm. The table of reliability values is periodically
munication range of the task vehicle as the service vehicles. updated according to the verified transactions in terms of
Then, the BS performs the vehicular task allocation and vehicular task offloading. In addition, to motivate BSs to per-
selects some of the service vehicles to execute the computing form vehicular task allocation appropriately, we also evalu-
tasks from the task vehicle. During the process of vehicular ate the reliability of BSs according to the performance in
task allocation, the BS can collect massive experiences vehicular task allocation and regard the reliability of BSs as a
in terms of vehicular task offloading which can be used basis in the consensus nodes selection.
for training the DRL-based algorithm of vehicular task allo-
cation. Moreover, to motivate service vehicles to contribute 3.3 Smart Contract
their idle computing resources, the task vehicle pays a ser-
In the blockchain-enabled VEC, the smart contract is a pre-
vice price for each offloading task, and the selected service
defined script [10], which is utilized to perform the comput-
vehicle allocates part of its computing resources for the off-
ing task allocation among vehicles and record the events of
loading task according to the service price. Furthermore, BS
task offloading automatically. To incentivize vehicles to
can use the pre-trained DRL model and perform the algo-
share their idle computing resources, a dynamic pricing
rithm of task allocation to dynamically select the service
scheme is exploited in vehicular task offloading, where ser-
vehicles and the service price according to the states of
vice vehicles allocate their idle computing resources accord-
vehicles in the communication range of the BS.
ing to the received service price. Furthermore, a DRL-based
algorithm of vehicular task allocation is developed to
3.2 Blockchain Model dynamically select the service vehicle and determine the
service price for each offloading task from task vehicles.
Due to the high dynamic vehicular environment, a task
Once a task vehicle sends a task offloading request to the
vehicle driving on the road may have different neighboring
BS, the smart contract in the BS processes the offloading
service vehicles in different time slots, and the task vehicle
request and triggers the DRL-based task allocation algo-
may not obtain much information from all of the neighbor-
rithm deployed in the BS. After the task is completed by
ing vehicles in real time, which makes it difficult to evaluate
the service vehicle, the BS verifies the execution result. If the
the willingness of neighboring vehicles to contribute their
result is valid and the offloading delay does not exceed the
computing resources. In some cases, a service vehicle agrees
deadline of the task, the task vehicle pays the service price
to accept the service request from the task vehicle, but the
to the service vehicle, and a transaction that records the task
computing resources that the service vehicle allocates for
offloading event is generated in the BS. The detailed algo-
the offloading task cannot ensure the task is completed
rithm of task allocation is presented in Section 4.
within the deadline, which leads to an offloading failure.
Therefore, it is necessary to construct an authentication
mechanism for vehicles in VEC. Blockchain is deemed as a 3.4 Consensus Process
promising solution, where the historical transactions of After each offloading task is completed, the offloading
vehicular task offloading are stored and utilized to evaluate information is recorded as a transaction in the blockchain
the reliability of vehicles in resource allocation. Addition- system, which includes the IDs of the task vehicle and ser-
ally, considering that public blockchain requires massive vice vehicle, the task profile, the offloading delay, the execu-
computation and time cost in the consensus process, which tion result of the task, and the timestamp. In each round of
is not suitable for vehicular networks, we thus employ a blockchain consensus, newly generated transactions are
lightweight consortium blockchain in the VEC system, verified by blockchain nodes and permanently stored in
where only a part of BSs are authorized to participate in the blocks, and the verified transactions can be used for evaluat-
blockchain consensus. Compared to the public blockchain, ing the reliability of service vehicles and BSs. Considering
there are fewer consensus nodes in the consortium block- that the accuracy of the estimated reliability depends on the
chain, and the computation complexity of blockchain con- timeliness of the transactions, the delay of transaction verifi-
sensus is lower. cation should be optimized in the blockchain consensus. We
In the blockchain-enabled VEC, considering the fast- thus propose a consensus algorithm to improve the effi-
changing situation in vehicular networks due to the mobility ciency of transaction verification. In the following, we will
of vehicles, the blockchain in the VEC system is maintained demonstrate how to implement the consensus in the block-
by stationary BSs, and the transactions in terms of vehicular chain system.
task offloading are recorded by BSs as well. Moreover, the Due to the high dynamic nature of vehicular networks, the
transactions are grouped and stored in a persistent, immuta- consensus delay of the blockchain should be short enough so
ble, and tamper-proof ledger, and are verified by the consen- that the information of vehicular computation offloading can
sus before attaching to the blockchain. Furthermore, the be updated in time. Some existing consensus protocols, such
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3886 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023
as Proof-of-Work (PoW), Proof-of-Stake (PoS), suffer from a MACs from the Mt transactions. The computation cost of
long time in the consensus process, which is not suitable for primary node is Mt ðb þ cÞ.
the VEC scenario. Therefore, in the consensus of the block-
chain-enabled VEC, we employ the PBFT protocol [28], [29], 3.4.3 Pre-Prepare
which has been widely applied in consortium blockchains In this step, the primary node generates one signature and
and can achieve less consensus latency than PoW and PoS. In one MAC for the unvalidated block and generates Nc 1
the traditional PBFT consensus algorithm [30], the consensus MACs for the pre-prepare message. The pre-prepare mes-
nodes are predetermined and the primary node is selected sage that contains the block is then sent to other replica
from the consensus nodes in round-robin order. Since the con- nodes. Each Replica node verifies one MAC from the block,
sensus performance is influenced by the number of consensus and Mt MACs and Mt signatures from the transactions in the
nodes, we then propose an enhanced consensus scheme based block. Then, the computation cost of the primary node is a þ
on PBFT to optimize the performance of blockchain consen- Nc c, and the computation cost of each replica node is c þ
sus, where the primary node and replica nodes are dynami- Mt ðb þ cÞ. In addition, without loss of generality, we assume
cally selected from BSs in a certain area according to the state that the transmission time of a block is proportional to the
of BSs, and the block size is determined according to the state block size. Then, the time of message delivery is t bk Sbk .
of BSs as well. In the consensus scheme, the primary node
and replica node are responsible for block generation and
3.4.4 Prepare
block verification, respectively, and the unselected BSs act as
the normal nodes, which only record task offloading results Once verifying the pre-prepare message, replica nodes send
as transactions in the blockchain and do not participate in the the prepare message to other consensus nodes. After each
block generation and verification. Additionally, the block- replica node collects 2f prepare messages matched with the
chain consensus employs signature and message authentica- pre-prepare message, the consensus enters the next step. In
tion code (MAC) to ensure the integrity and authentication of this step, the primary node needs to verify 2f MACs, every
a transaction. In the consensus, signing a block or transaction, replica node needs to generate Nc 1 MACs and verify 2f
verifying a signature, generating a MAC, and verifying a MACs. Therefore, the computation cost of the primary node
MAC require a, b, c, c CPU cycles [31], [32], respectively. is 2fc, and the computation cost of each replica node is
Moreover, to guarantee the security of the blockchain system, ðNc 1 þ 2fÞc. The time of message delivery is t bk Sbk .
the consensus allows at most f ¼ bðNc 1Þ=3c consensus
nodes are faulty [29]. Finally, the main steps of the consensus 3.4.5 Commit
are shown as follows. Upon receipt of 2f matching prepare messages, each con-
sensus node sends a commit message to the other consensus
nodes. In this step, the primary node and each replica node
3.4.1 Consensus Nodes Selection verify 2f MACs and generate Nc 1 MACs. The computa-
At the beginning of each round of consensus, the primary tion cost of the primary node and each replica node is ðNc
node in the previous round of consensus performs the algo- 1 þ 2fÞc. The time of message delivery is t bk Sbk .
rithm of consensus nodes selection to select one primary
node and multiple replica nodes from BSs in a given area, 3.4.6 Reply
and the other BSs in the area are the normal nodes. The Upon receipt of 2f matching commit messages, replica
available computing resources for the consensus in the BS nodes accept the block as valid and append the verified
and the reliability of the BS in vehicular task allocation are block to its local ledger, and send the reply message con-
the two main factors that affect the consensus nodes selec- taining the validated block to other BSs. Then, each BS
tion. In other words, the BS with higher reliability and more updates its global view from the verified block after receiv-
computing resources will have a higher probability of being ing f þ 1 matching reply messages. In this step, the primary
selected as a consensus node. In each round of consensus, node and each replica node generate one MAC for the reply
each consensus node will gain a reward for the contribution message, the computing cost of each node is c. The time of
to the blockchain consensus. Therefore, to gain more message delivery is t bk Sbk .
rewards from the blockchain, each BS tends to improve its The main process of consensus is shown in Fig. 2. Finally,
reliability in vehicular task allocation, which prevents some the computation time of the primary node is
malicious BSs from giving inappropriate or unfair policies
in vehicular task allocation. The detailed algorithm of con- Mt ðb þ cÞ þ a þ ð2Nc þ 4fÞc
sensus nodes selection is presented in Section 5. TpV ¼ ; (1)
Gp
TF ¼ TI þ TD þ TV ; (3)
T F dT I ; (4)
the communication range of a BS, then the relationship
where d > 1 is a constant. In (4), T contains block generat-
F between the average velocity of vehicles v and traffic den-
ing interval T I and the time of block verification, and T F sity r is
dT I ensures that the time of block verification should be less
than a threshold so that newly generated blocks can be veri- r
v ¼ vmax 1 ; (5)
fied in each round of consensus, and the threshold is deter- rmax
mined by T I .
where vmax and rmax are the maximum velocity of vehicles
3.5 Process of Blockchain-Enabled VEC and maximum traffic density, respectively.
The main process of our proposed blockchain-enabled VEC
system is presented in Fig. 3. In the VEC system, a task vehi- 4.1 Computation Model
cle that has several offloading tasks first sends the offloading All of the offloading tasks in the task vehicle are assumed to
request to the nearby BS, then the smart contract in the BS is be computation-intensive and are offloaded independently
triggered. The BS collects the information of the vehicles sur- in VEC. The profile of an offloading task is denoted as
rounding the task vehicle and evaluates the reliability of the fDn ; Cn ; t n g, where Dn is the data size of the task, Cn is the
vehicles according to the transactions stored in the consor- computation size which represents the required CPU cycles
tium blockchain, and then selects a service vehicle for task for completing the task, t n is the maximum tolerable delay
offloading. After the V2V computation offloading is finished, of the task. In many cases, the execution result of a task is
a transaction that records the result of the computation off- much smaller than data size Dn , we thus ignore the trans-
loading is generated and verified by the blockchain consen- mission time of execution result in the calculation of off-
sus. In each round of consensus, the consensus nodes are loading delay. Then, the offloading delay of a task is
selected from the BSs in the VEC system and then perform
the PBFT-based consensus. Finally, the verified transactions Dn Cn
tn ¼ þ ; (6)
are stored in the blockchain system and can be further used rts fn
for evaluating the reliability of vehicles and BSs.
where rts is the transmission rate of the V2V link between
task vehicle Vt and service vehicle Vs , fn is the allocated
4 SMART CONTRACT FOR V2V COMPUTATION computing resources for the offloading task in the service
OFFLOADING vehicle. Similar to [35], we utilize the channel capacity to
In this section, we will detail our proposed smart contract estimate the V2V transmission rate, which can be given as
for vehicular task allocation in the blockchain-enabled VEC
system. In the system, the problem of vehicular task alloca- Pt da 2
ts hts
rts ¼ Wts log 2 1 þ 2 ; (7)
tion in BS is a sequential decision problem, which can be s w þ Its
formulated as a Markov decision process (MDP). To solve
the problem, we develop a DRL-based algorithm of vehicu- where Wts is the allocated bandwidth of the V2V link
lar task allocation to improve the efficiency and the reliabil- between Vt and Vs , Pt is the transmission power of Vt , dts is
ity of vehicular computation offloading. the distance between Vt and Vs , a is a constant which repre-
We first consider the VEC scenario illustrated in Fig. 1, sents the path loss factor, hts is the channel gain of the V2V
where multiple vehicles drive on a one-way road, and sev- link, s 2w represents the power spectrum density of additive
eral BSs are distributed at the roadside. Similar to [34], we white Gaussian noise, and Its denotes the interference intro-
utilize a free flow model to characterize the traffic model in duced by other V2V transmissions.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3888 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023
Particularly, due to the short V2V link duration and lim- establish a reliability model to evaluate the efficiency of task
ited vehicular computing capability, we do not consider the offloading in a service vehicle.
queuing time in the task offloading, i.e., if a V2V link inter- We first define a normalized utility for an offloading task
ruption occurs in the process of task offloading, the task off- that can be completed within the deadline, which is shown
loading will be regarded as a failure, and the utility of the as follows:
task will be negative as a penalty. Then, the task vehicle
log ð1 þ t n tn Þ
needs to resubmit a new service request to BS for the task. U~n ¼ ; (12)
log ð1 þ t n Þ
4.2 Task Vehicle Model
where log ð1 þ t n Þ represents the maximum utility of task
When a task vehicle offloads a task to a service vehicle, it fn . After a service vehicle completes an offloading task, the
needs to pay the service vehicle the service price, and we computation efficiency of the service vehicle is updated by
assume that the service price of performing a computing
task in a service vehicle is proportional to the computation s ¼ ð1 v1 Þs0 þ v1 U~n ; v1 2 ð0; 1Þ; (13)
size of the task, which is represented as pn Cn , where pn
denotes the unit price. s0
where represents the previous computation efficiency of
For a computing task fn , similar to [36], [37], the utility is Vs . Besides, the completion ratio of a service vehicle is
related to the completion time of the task, which is shown as updated by
log ð1 þ t n tn Þ; tn t n ; Ns z0s þ m1ðtn t n Þ
Un ¼ (8) zs ¼ ; (14)
L; tn > t n ; Ns þ 1
where L < 0 is a constant that represents the penalty of where Ns represents the total number of received offloading
offloading failure. Then, the utility of Vt for offloading a tasks in Vs , z0s denotes the previous completion ratio of Vs ,
task can be given as and m1ðÞ is the indicator function. Then, the reliability of a
service vehicle can be defined as
U n ¼ Un pn Cn : (9)
hs ¼ v2 s þ ð1 v2 Þzs ; v2 2 ð0; 1Þ: (15)
As a result, if a service vehicle cannot allocate appropriate
4.3 Service Vehicle Model
computing resources for an offloading task according to the
In the process of task offloading, once a service vehicle service price, the reliability of the service vehicle will be
receives a service request and corresponding service price, reduced. In our proposed V2V computation offloading algo-
the service vehicle will contribute part of its computing rithm, the reliability of service vehicles is an important fac-
resources for the offloading task. We consider a service tor that affects the selection of service vehicles and service
vehicle Vs , and assume that the cost of executing a comput- price, a vehicle with low reliability usually has a low proba-
ing task is proportional to the energy consumption, and bility of being selected as the service vehicle for offloading
denote the computing resources allocated for task fn as fn . tasks.
Similar to [38], the energy consumption of executing task fn
is calculated by 4.4 Problem Formulation: Vehicular Task Allocation
Cn In the VEC system, we consider a task vehicle which is
En ¼ kfn3 ¼ kfn2 Cn ; (10) denoted as Vt . The service vehicles in the communication
fn
range of Vt are denoted as S ¼ fV1 ; . . . ; Vs ; . . . ; VS g. When Vt
where k > 0 is a constant. Then, the utility of Vs for execut- sends a task offloading request to a BS, the smart contract in
ing task fn can be given as the BS is automatically triggered. We assume that there are
N offloading tasks from Vt in a period, which are denoted as
Uns ¼ pn Cn $kfn2 Cn ; (11) N ¼ ff1 ; . . . ; fn ; . . . ; fN g. For each offloading task, the BS
dynamically chooses the service vehicle and determines the
where $ is the coefficient that represents the price per unit corresponding service price by observing the vehicular envi-
of energy consumption. ronment. After the task is completed by the service vehicle,
Since the utility of a service vehicle
pffiffiffiffiffiffiffiffiffimust
ffi be non-nega- the execution result is transmitted to Vt and the BS, and then
tive, the range of fn is ½0; minfFs ; pn =kg, where p is ffithe
Fsffiffiffiffiffiffiffiffiffi the BS verifies the result and sends the validity of the result
maximum available computing resources in Vs , pn =k is to Vt . If the execution result is valid and the offloading delay
the upper bound of fn obtained from (11) with the consider- is within the maximum tolerable delay of the task, Vt will
ation of the non-negative utility of service vehicle. More- pay the service price to the service vehicle. Otherwise, the
over, it can be seen from (11) that the utility of a service service vehicle will not obtain any payment from Vt . Then,
vehicle mainly depends on the unit service price and the the BS generates a transaction that records the task offload-
allocated computing resources for task fn . A service vehicle ing event. In the V2V computation offloading, we formulate
can increase the utility by allocating fewer computing the problem of vehicular task allocation in the BS as an opti-
resources for task fn , but it will reduce the utility of task fn mization problem with the objective of maximizing the util-
or even make the task not completed within the deadline. ity of Vt with the consideration of the mobility, the
To prevent some malicious vehicles from allocating insuffi- computing capability, and the reliability of vehicles. The
cient computing resources for offloading tasks, we further optimization problem is formulated as follows:
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3889
1X N X S
4.4.3 Reward Function
max xs U n ;
A N n¼1 s¼1 n According to the optimization objective presented in (16),
we define the immediate reward after conducting action at
s.t. C1 : xsn 2 f0; 1g; 8n 2 N ; s 2 S;
at time t as
X
S
C2 : xsn ¼ 1; 8n 2 N;
X
S
s¼1
Rt ¼ xsn U n ; (20)
C3 : xsn ðls tn Þ 0; 8n 2 N ; s 2 S; s¼1
C4 : xsn ðhs h0 Þ 0; 8n 2 N ; s 2 S; (16)
where U n is the utility of Vt for offloading a task as
where A ¼ fa1 ; . . . ; an ; . . . ; aN g, and an ¼ fx1n ; . . . ; xsn ; . . . xSn ; defined in (9), which is related to service vehicle and ser-
pn g. In constraint C1, xsn ¼ 1 means that task fn is offloaded vice price.
to Vs , and xsn ¼ 0 otherwise. Constraint C2 guarantees that a
computing task is only offloaded to one service vehicle. Con- 4.5 SAC-Based Computation Offloading Algorithm
straint C3 ensures that the completion time of an offloading In the problem of vehicular task allocation, the action in
task should be less than the V2V link duration between Vt terms of the selection of service vehicles and the determina-
and Vs , and the V2V link duration is estimated by tion of service price contains both discrete decision varia-
bles and continuous variable, we thus employ soft actor-
D zt zs
ls ¼ ; (17) critic (SAC) [39] method to obtain the hybrid decision policy
jvt vs j vt vs
in vehicular task allocation. SAC is a reinforcement learning
algorithm which is based on off-policy actor-critic model.
where zt , vt represent the position and velocity of task vehi- To be specific, the actor network in SAC is used to generate
cle, respectively, and zs , vs represent the position and veloc- action according to the observed system state and improve
ity of service vehicle, respectively. Constraint C4 ensures the policy, while the critic network is responsible for evalu-
that the reliability of the selected service vehicle is higher ating the policy provided by the actor network. In the fol-
than threshold h0 . lowing, we will demonstrate how SAC works in task
In dynamic vehicular networks, the V2V link state, the allocation.
idle computing resources, and the reliability of service
vehicles are all time-variant, and the multi-task allocation
can be regarded as a sequential decision problem. We refor- 4.5.1 Policy and Value Function
mulate the optimization problem in (16) as an MDP, and the The main goal of SAC is to find an optimal policy that maxi-
system state space, action space, and reward are defined as mizes the expected long-term reward, and meanwhile maxi-
follows. mizes the entropy of policy to improve the robustness and
exploration. The policy is defined as an action-selection
4.4.1 State Space strategy which can be either deterministic or stochastic. In
SAC, the policy is stochastic and is denoted as pðajsÞ, which
The system state at time t can be defined as
represents a probability distribution over actions given an
observed state s. We then define a soft action-value function
st ¼ ½F1 ðtÞ; . . . ; Fs ðtÞ; . . . ; FS ðtÞ; g 1 ðtÞ; . . . ; g s ðtÞ; . . . ; g S ðtÞ;
that represents the expected long-term reward combined
l1 ðtÞ; . . . ; ls ðtÞ; . . . ; lS ðtÞ; h1 ðtÞ; . . . ; hs ðtÞ; . . . ; hS ðtÞ; with the expected entropy of policy p when given an initial
DðtÞ; CðtÞ; tðtÞ; (18) state s and action a, which is shown as
" #
where Fs ðtÞ represents the available computing resources in X
T 1 X
T 1
Qp ðs; aÞ ¼ E nt Rt þb nt Hðpðjst ÞÞjs0 ¼ s; a0 ¼ a ; (21)
Vs , g s ðtÞ is the signal-to-noise ratio (SNR) of the V2V link t¼0 t¼1
between Vt and Vs , ls ðtÞ denotes the estimated V2V link
duration between Vt and Vs , hs ðtÞ is the reliability of Vs , where b 2 ½0; 1 is the temperature parameter that controls
which is estimated by the historical transactions in terms of the stochasticity of the policy, and n is the discount factor.
task offloading in the blockchain according to (15), DðtÞ, Further, we define a soft state-value function that denotes
CðtÞ, and tðtÞ are the data size, computation size, and dead- the expected long-term reward combined with the entropy
line of the offloading task at time t, respectively. of policy starting from state s and taking actions following
some policy p, which is given as follows:
4.4.2 Action Space " #
The conducted action at time t can be given as X
T 1
V ðsÞ ¼ E
p
n ðRt þ bH ðpðjst ÞÞÞjs0 ¼ s :
t
(22)
t¼0
at ¼ ½x1n ðtÞ; . . . ; xsn ðtÞ; . . . ; xSn ðtÞ; pn ðtÞ; (19)
Since the dimensions of both state space and action space
where xsn ðtÞ ¼ 1 represents that service vehicle Vs is selected could be extremely large with the increase of traffic density,
to execute task fn , and xsn ðtÞ ¼ 0 otherwise, and pn ðtÞ we employ neural networks to approximate both the soft
denotes the unit service price of fn paid to the selected ser- action-value function and the policy, which are denoted as
vice vehicle. Qu ðs; aÞ and pc ðajsÞ, respectively.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3890 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023
BSs to improve their reliability in vehicular task allocation, 5.2 Problem Formulation: Consensus Nodes
we model the consensus nodes and incorporate the incen- Selection
tive mechanism into the reward of consensus nodes in Our aim is to maximize the expected long-term reward of con-
Section 5.1. Besides, considering that the available com- sensus nodes. According to the reward of consensus nodes
puting resources in BSs for consensus and the reliability defined in (30), we formulate the optimization problem as
of BSs are both time-variant, we employ a DRL-based
algorithm in Section 5.3 so that the policy of consensus X
T
0
nodes selection can dynamically adapt to the state of the max t t Rðt0 Þ;
A0
VEC system. t0 ¼t
s.t. C1 : yb ðt0 Þ 2 f0; 1g; 8b 2 I;
5.1 Consensus Node Model X
B
In the blockchain-enabled VEC system, we assume that C2 : yb ðt0 Þ ¼ Nc ;
b¼1
there are B BSs distributed in a certain area, which are
denoted as I ¼ fI1 ; . . . ; Ib ; . . . ; IB g. In each round of block- C3 : yb ðt0 ÞðGb GÞ 0; 8b 2 I
chain consensus, all the consensus nodes are selected by the C4 : yb ðt0 Þðib ðt0 Þ LÞ 0; 8b 2 I
primary node of the previous round of consensus. Accord- C5 : T dT 0;
F I
(31)
ing to the analysis of blockchain consensus presented in Sec-
tion 3.4, the performance of consensus mainly depends on
the block size and available computing resources in BSs. where A0 ¼ fy1 ðt0 Þ; . . . ; yb ðt0 Þ; . . . ; yB ðt0 Þ; Sbk ðt0 Þg, and 2
Considering that with the traffic density in the communica- ð0; 1 is the reward discount factor. In constraint C1, yb ðt0 Þ ¼
tion range of a BS increasing, there will be more vehicles 1 indicates that BS Ib is selected as a consensus node at time
sending computing service requests to the BS, and without t0 , and yb ðt0 Þ ¼ 0 otherwise. Constraint C2 indicates that the
loss of generality, we assume that the arrival rate of vehicu- total number of consensus nodes is Nc . Constraint C3 guar-
lar service requests in a BS is proportional to the traffic den- antees that the available computing resources in a consen-
sity in the communication range of the BS. Then, the sus node should be higher than threshold G. Constraint C4
available computing resources for blockchain consensus in ensures that the reliability of a consensus node should be
the BS can be given as higher than threshold L. In constraint C5, the consensus
delay should satisfy the constraint in (4).
~b þ mrb Þ; In the consensus nodes selection, the available comput-
Gb ¼ Gtotal
b ðG (28)
ing resources for blockchain consensus in each BS depends
on the traffic density in the communication range of the BS,
where Gtotal is the total computing capability of BS Ib , G ~b
b the reliability of each BS depends on the performance in
represents the computing resources reserved for some non- vehicular task allocation, and both of which are time-vari-
vehicular computing tasks in Ib , rb is the traffic density in ant. Moreover, the consensus nodes selection is a joint deci-
the communication range of Ib , and m > 0 is a constant. Fol- sion-making problem, where the block size and multiple
lowing similar assumption of block reward in [38], we consensus nodes are jointly determined in each round of
define the reward of a consensus node after completing a consensus, which is difficult to be solved by traditional opti-
round of consensus as mization methods. In the following, we will transform the
( problem as an MDP and demonstrate how to solve the prob-
"1 ib þ "2 Sxbk log ð1 þ T TbV Þ; TbV T ; lem by using a DRL-based method.
Rb ðtÞ ¼ (29)
G; TbV > T ;
5.2.1 State Space
In each period, the system state of all the BSs is denoted as
where ib is the reliability of BS Ib , "1 and "2 are the weights of
reliability and number of validated transactions, respec-
st ¼ ½G1 ðtÞ; . . . ; Gb ðtÞ; . . . ; GB ðtÞ;
tively, TbV denotes the delay of block verification in Ib , T ¼
dT I ðT I þ T D Þ represents the maximum tolerable delay of i1 ðtÞ; . . . ; ib ðtÞ; . . . ; iB ðtÞ; Ntr ðtÞ; (32)
block verification in Ib , and G < 0 is a constant that repre-
sents the penalty. It can be seen from (29) that the reward of a where Gb ðtÞ is the available computing resources for block-
consensus node is simultaneously determined by the block chain consensus in BS Ib at time t, Ntr ðtÞ represents the num-
size and the reliability, the BS that validates more transac- ber of transactions generated by BSs in a block interval, ib ðtÞ
tions and has higher reliability will get more reward in the represents the reliability of Ib in vehicular task allocation,
consensus process. Therefore, BSs are motivated to improve which is evaluated by the average normalized utility
their reliability in vehicular task allocation and contribute defined in (12) and the completion ratio of the vehicular off-
more computing resources for blockchain consensus. After loading tasks allocated by the BS, and the calculation of ib ðtÞ
a round of consensus is completed, the average reward is the same as (15). It is worth noting that the reliability of a
obtained by the consensus nodes from blockchain is BS reflects the task allocation in the BS, which depends on
the selection of service vehicle and service price, while the
reliability of a service vehicle reflects the task execution in
1 X Nc
RðtÞ ¼ Rb ðtÞ: (30) the service vehicle, which depends on the number of allo-
Nc b¼1
cated computing resources allocated for offloading tasks.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3892 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023
5.2.2 Action Space main Q-network. In the target Q-network, the target Q-
We represent the action conducted by the agent at time t as value can be calculated as follows:
where yb ðtÞ ¼ 1 indicates that BS Ib is selected as a consen- Since we aim to make the Q-value of the main Q-network
sus node, and yb ðtÞ ¼ 0 otherwise. Sbk ðtÞ denotes the size of close to the target Q-value, the loss function of the main Q-
unverified block, and Sbk ðtÞ 2 f0:2; 0:4; . . . ; S_ bk g, where S_bk network is defined as follows:
is the maximum block size in the blockchain. Furthermore,
jE 0 j
each consensus node has the potential to be selected as the 1 X 2
primary node, and the other consensus nodes are the replica JðuÞ ¼ 0 z Qðsi ; ai ; uÞ : (39)
jE j i¼1 i
nodes. Among the consensus nodes chosen from at , we
select the consensus node that maximizes RðtÞ as the pri- In each period, the agent samples a batch of experiences E 0
mary node. from the replay buffer, and trains the main Q-network by
minimizing the loss function in (39). Finally, after every N
5.2.3 Reward Function steps, the weights of the target Q-network u is updated
According to the optimization objective defined in (31), the with an exponential moving average of u. The detailed algo-
immediate reward in a round of consensus can be given as rithm is presented in Algorithm 2.
TABLE 2
Simulation Parameters
Parameter Value
r 5 40 vehicles/km
Fs [3,7] GHz
Gtotal
b [20,29] GHz
D 500 m
Wts 10 MHz
Dn ½0:2; 4 Mbits
Cn ½0:2; 3:2 109 cycles
tn f0:5; 1; 2; 4g s
x 2 KB
S_ bk 8 MB [42] Fig. 4. The average utility of task vehicle for offloading a task in the pro-
a 2 Mcycles posed algorithm with different learning rates.
b 8 Mcycles [31]
c 0.5 Mcycles [31]
rates. It can be seen that the average utility reaches conver-
gence around 3,000 training episodes, and the algorithm with
learning rates Q ¼ 8 104 ; p ¼ 8 104 achieves the best
simulations with different numbers of consensus nodes Nc , performance compared with the algorithm with other learn-
and the range of Nc is set as [6, 30]. The other parameters in ing rates.
our simulation are summarized in Table 2. As shown in Fig. 5, the average utility of task vehicle for
offloading a task in our proposed algorithm is higher than
that in GTO and RTO, because the optimization objective of
6.2 Performance of V2V Computation Offloading our proposed algorithm is to maximize the expected long-
In the simulation of the vehicular computation offloading term reward, and the reward is related to the utility of task
algorithm, actor network pc ðajsÞ and critic networks vehicle, while in GTO, the agent always chooses the service
Qu1 ðs; aÞ and Qu2 ðs; aÞ are all fully-connected deep neural vehicle with the maximum idle computing resources in
networks (DNNs) with two hidden layers. We set the size of each step, which may not obtain the maximum average util-
the hidden layers in pc ðajsÞ as ð800; 500Þ, and set the learn- ity of offloading tasks. Moreover, GTO does not consider
ing rate of pc ðajsÞ as p ¼ 8 104 . Similarly, we set the the reliability of service vehicles in task offloading, some
size of the hidden layers in both Qu1 ðs; aÞ and Qu2 ðs; aÞ as tasks may be offloaded to service vehicles with low reliabil-
(800,500), and the learning rate of the two networks is set as ity and obtain fewer computing resources, which leads to
Q ¼ 8 104 . In addition, the batch size is set as jEj ¼ 256, the lower average utility of the task vehicle.
and the delay factor is set as vs ¼ 0:005. In Fig. 6, the completion ratio of offloading tasks by
The complexity of the vehicular computation offloading applying the three algorithms is presented. It can be seen
algorithm is commensurate with that of the DNN model. that the completion ratio of offloading tasks by applying
Given a VEC scenario where the number of service vehicles our proposed algorithm is higher than by applying GTO
is S, then the dimension of the state space is 4S þ 3 and the and RTO. Furthermore, with the traffic density increasing,
dimension of the action space is 2S according to (18) and there are more service vehicles available for task offloading,
(19). Then, the complexity of action generation in the algo- thus offloading tasks may obtain more computing resources
rithm can be given as Oðð4S þ 3Þh1 þ h1 h2 þ 2h2 SÞ, where from service vehicles, which increases both the average util-
ðh1; h2Þ is the size of the hidden layers in both actor network ity and the completion ratio of offloading tasks.
and critic network. In our simulations, the size of hidden Figs. 7a and 7b present the average delays of the offload-
layers in actor and critic is fixed, the complexity of action ing tasks that have been successfully completed under the
generation can be given as OðSÞ. Besides, in the training traffic density 10 vehicles/km and 30 vehicles/km, respec-
process of the algorithm, the complexity in each step is OðSÞ tively. We can observe that under low traffic density, the
as well. If there are No offloading tasks in a period, and the
training steps in a period are Nt , then the time complexity
of the algorithm in a period is OððNo þ Nt ÞSÞ.
To verify the performance of our proposed algorithm, we
introduce the following algorithms for comparison:
Fig. 6. The completion ratio of offloading tasks under different traffic Fig. 8. The average utility per period and the selection probability of ser-
densities. vice vehicles (SVs) with different reliabilities when the traffic density is
30 vehicles/km.
[37] X. Huang, R. Yu, J. Kang, and Y. Zhang, “Distributed reputation Jian Wang (Senior Member, IEEE) received the
management for secure and efficient vehicular edge computing PhD degree in electronic engineering from Tsinghua
and networks,” IEEE Access, vol. 5, pp. 25 408–25 420, 2017. University, Beijing, China, in 2006. He is currently a
[38] M. Liu, F. R. Yu, Y. Teng, V. C. M. Leung, and M. Song, professor with the Department of Electronic Engi-
“Distributed resource allocation in blockchain-based video neering, Tsinghua University. His research interests
streaming systems with mobile edge computing,” IEEE Trans. include application of statistical theories, optimiza-
Wireless Commun., vol. 18, no. 1, pp. 695–708, Jan. 2019. tion, machine learning to communication, network-
[39] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: ing, navigation, and resource allocation problems.
Off-policy maximum entropy deep reinforcement learning with
a stochastic actor,” in Proc. 35th Int. Conf. Mach. Learn., 2018,
pp. 1861–1870.
[40] T. Haarnoja et al., “Soft actor-critic algorithms and applications,”
2018, arXiv: 1812.05905.
[41] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement Jian Yuan (Member, IEEE) received the PhD
learning with double Q-learning,” in Proc. 13th AAAI Conf. Artif. degree in electrical engineering from the Univer-
Intell., 2016, pp. 2094–2100. sity of Electronic Science and Technology of
[42] J. Feng, F. R. Yu, Q. Pei, X. Chu, J. Du, and L. Zhu, “Cooperative China, in 1998. He is currently a professor with the
computation offloading and resource allocation for blockchain- Department of Electronic Engineering at Tsinghua
enabled mobile-edge computing: A deep reinforcement learning University, Beijing, China. His research focuses
approach,” IEEE Internet Things J., vol. 7, no. 7, pp. 6214–6228, Jul. on complex dynamics of networked systems.
2020.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.