0% found this document useful (0 votes)
23 views16 pages

DRL-Based V2V Computation Offloading For Blockchain-Enabled Vehicular Networks

 This document proposes a blockchain-enabled vehicular edge computing (VEC) framework to improve the reliability and efficiency of vehicle-to-vehicle (V2V) task offloading. A deep reinforcement learning (DRL)-based computation offloading scheme is developed for the smart contract of blockchain, allowing task vehicles to offload computation-intensive tasks to neighboring vehicles.  Blockchain is utilized to ensure security and reliability in task offloading by evaluating the reliability of vehicles in resource allocation. An enhanced consensus algorithm and consensus nodes selection algorithm are also proposed to improve the efficiency of consensus and motivate base stations to improve reliability in task allocation.  Simulation results validate the effectiveness of the proposed blockchain-enabled VEC

Uploaded by

shreya kotagal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

DRL-Based V2V Computation Offloading For Blockchain-Enabled Vehicular Networks

 This document proposes a blockchain-enabled vehicular edge computing (VEC) framework to improve the reliability and efficiency of vehicle-to-vehicle (V2V) task offloading. A deep reinforcement learning (DRL)-based computation offloading scheme is developed for the smart contract of blockchain, allowing task vehicles to offload computation-intensive tasks to neighboring vehicles.  Blockchain is utilized to ensure security and reliability in task offloading by evaluating the reliability of vehicles in resource allocation. An enhanced consensus algorithm and consensus nodes selection algorithm are also proposed to improve the efficiency of consensus and motivate base stations to improve reliability in task allocation.  Simulation results validate the effectiveness of the proposed blockchain-enabled VEC

Uploaded by

shreya kotagal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

3882 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO.

7, JULY 2023

DRL-Based V2V Computation Offloading


for Blockchain-Enabled Vehicular Networks
Jinming Shi , Student Member, IEEE, Jun Du , Senior Member, IEEE,
Yuan Shen , Senior Member, IEEE, Jian Wang , Senior Member, IEEE,
Jian Yuan , Member, IEEE, and Zhu Han , Fellow, IEEE

Abstract—Vehicular edge computing (VEC) is an effective method to increase the computing capability of vehicles, where vehicles
share their idle computing resources with each other. However, due to the high mobility of vehicles, it is challenging to design an optimal
task allocation policy that adapts to the dynamic vehicular environment. Further, vehicular computation offloading often occurs between
unfamiliar vehicles, how to motivate vehicles to share their computing resources while guaranteeing the reliability of resource allocation
in task offloading is one main challenge. In this paper, we propose a blockchain-enabled VEC framework to ensure the reliability and
efficiency of vehicle-to-vehicle (V2V) task offloading. Specifically, we develop a deep reinforcement learning (DRL)-based computation
offloading scheme for the smart contract of blockchain, where task vehicles can offload part of computation-intensive tasks to
neighboring vehicles. To ensure the security and reliability in task offloading, we evaluate the reliability of vehicles in resource allocation
by blockchain. Moreover, we propose an enhanced consensus algorithm based on practical Byzantine fault tolerance (PBFT), and
design a consensus nodes selection algorithm to improve the efficiency of consensus and motivate base stations to improve reliability
in task allocation. Simulation results validate the effectiveness of our proposed scheme for blockchain-enabled VEC.

Index Terms—Vehicular edge computing (VEC), computation offloading, blockchain, deep reinforcement learning (DRL)

1 INTRODUCTION Vehicle-to-vehicle (V2V) computation offloading is becoming


a promising mechanism that can alleviate the workload of BS
the development of autonomous driving and Inter-
W ITH
net of Vehicles (IoV) technologies, various vehicular
applications have been emerging to improve the traffic safety
effectively, where task vehicles with limited computing
resources can offload some computing tasks to neighboring
vehicles with idle computing resources [3]. Moreover, com-
and driving experience. However, because of the limited
pared to the fixed-position BS, the link duration between two
onboard computing and storage resources, some computa-
vehicles driving in the same direction is longer because the rel-
tion-intensive tasks cannot be performed within the deadlines,
ative velocity between the two vehicles is smaller.
which need the assistance of central cloud or edge computing
However, most studies on VEC aimed at the computing
[1], [2]. Due to the high transmission delay between vehicles
resource allocation and the incentive of sharing computing
and the central cloud, vehicular edge computing (VEC) is
resources among vehicles, the security and trust issues in VEC
commonly exploited to improve the computing capability of
were seldom considered. Due to the high mobility of vehicles,
vehicles, where vehicles can transmit computing tasks to
V2V computation offloading always occurs between unfamil-
either base station (BS) or neighboring vehicles. Considering
that with the increase of the traffic density in the coverage of a iar vehicles. Service vehicles may not allocate sufficient com-
BS, limited computing capability in the BS cannot satisfy all puting resources for task vehicles, so that the offloading tasks
the vehicular computing requirements at the same time. may not be completed within the maximum tolerable delay.
Furthermore, in the process of V2V computation offloading,
some traditional methods, such as game theory [4] and auction
 Jinming Shi, Jun Du, Yuan Shen, Jian Wang, and Jian Yuan are with the
mechanism [5], [6], can only optimize the relationship between
Department of Electronic Engineering, Tsinghua University, Beijing 100084, resource assignment and service price, the reliability of
China. E-mail: [email protected], [email protected], {shenyuan_ee, resource allocation in service vehicles is hardly guaranteed.
jian-wang, jyuan}@tsinghua.edu.cn.
 Zhu Han is with the Department of Electrical and Computer Engineering,
Therefore, it is necessary for VEC to establish a mechanism that
University of Houston, Houston, TX 77004 USA, and also with the ensures the security and reliability in the process of vehicular
Department of Computer Science and Engineering, Kyung Hee Univer- computation offloading, and meanwhile motivates vehicles to
sity, Seoul 446-701, South Korea. E-mail: [email protected].
share more computing resources to reduce the offloading delay
Manuscript received 13 Dec. 2021; accepted 14 Feb. 2022. Date of publication and improve the completion ratio of vehicular offloading tasks.
23 Feb. 2022; date of current version 5 June 2023.
This work was supported in part by the National Natural Science Foundation of To meet the requirements of security and reliability in
China under Grant 61971257, and in part by the Young Elite Scientist Sponsor- computation offloading, some works [7], [8], [9] have applied
ship Program by CAST under Grants 2020QNRC001, NSF CNS-2107216, and blockchain in edge computing. In the scenario of VEC, block-
EARS-1839818. chain can effectively improve the security and reliability of
(Corresponding author: Jian Wang.)
Recommended for acceptance by W. Liao. V2V computation offloading [10], where transactions in
Digital Object Identifier no. 10.1109/TMC.2022.3153346 terms of vehicular computation offloading are recorded in a
1536-1233 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3883

decentralized, transparent and immutable manner [11], [12]. utilized to motivate vehicles to contribute their idle
Moreover, blockchain relies on data transmitted by the P2P computing resources and improve the efficiency and
network layer while vehicular edge computing (VEC) para- reliability of V2V task offloading. Furthermore, to
digm originates from P2P but extends the peer to vehicles, make the proposed scheme adapt to the dynamic
thus blockchain and VEC have the same distributed network vehicular environment, the vehicular computing capa-
framework [13], [14]. Besides, blockchain can be regarded as bility, the V2V link state, and the reliability of service
a distributed database, and blockchain consensus requires vehicles are all considered in task allocation.
massive computation, while in VEC, vehicles can access  Since the reliability of service vehicles is evaluated
edge servers for storage and computing services. Therefore, by previous task offloading events recorded in the
blockchain and VEC have the same functions of storage and transactions, and the accuracy of the estimated reli-
computation. It is worth noting that in a public blockchain, ability depends on the timeliness of the transactions,
all distributed nodes need to participate in the consensus we design a PBFT-based consensus, and develop a
process, and it takes a long time in block generation and veri- DRL-based algorithm to improve the efficiency of
fication, which is not suitable for high dynamic vehicular transaction verification by jointly optimizing consen-
networks. As a result, some works [15], [16] employed con- sus nodes selection and block size, and meanwhile
sortium blockchain in the VEC system, where only a part of motivate BSs to improve reliability in task allocation.
BSs are authorized to participate in the consensus process of The remainder of the paper is organized as follows. The
blockchain. To improve the efficiency of consensus, some related works are presented in Section 2. In Section 3, we
works [17], [18] proposed non-compute-intensive algorithms present the architecture of the blockchain-enabled VEC. Sec-
based on practical Byzantine fault tolerance (PBFT), which tion 4 describes the V2V computation offloading algorithm
are suitable for lightweight IoV-oriented blockchains. in smart contract. The consensus nodes selection in the block-
Furthermore, there are still some challenges to be solved chain is detailed in Section 5. Finally, we present the simula-
in blockchain-enabled VEC. First, due to the dynamic vehic- tion results and conclusion in Sections 6 and 7, respectively.
ular environment, it is challenging to establish an efficient
mechanism for vehicular task offloading with the consider-
2 RELATED WORK
ation of vehicle’s mobility and heterogeneous vehicular
computing capability. Second, how to motivate vehicles to 2.1 Vehicular Edge Computing
contribute their onboard computing resources and mean- The problem of vehicular task offloading in VEC has been
while guarantee the reliability of onboard resource alloca- widely investigated in recent works, and some optimization
tion is also a problem that should be carefully investigated. methods have been utilized to solve the problem. In [19],
Third, the performance of PBFT-based consensus protocol the task offloading problem was solved by a modified Ant
in consortium blockchain depends on the number of con- Colony Optimization algorithm. In [5], the problem of com-
sensus nodes and the block size. Therefore, how to design a putation offloading in VEC was formulated as an auction-
scheme that jointly optimizes the consensus nodes selection based graph job allocation problem and solved by a struc-
and block size with the aim of improving the performance ture-preserved matching algorithm. To minimize the overall
of blockchain is also a challenge. response time of offloading tasks with interdependency, a
Motivated by the challenges mentioned above, we design modified genetic algorithm-based task scheduling scheme
a VEC framework based on a lightweight blockchain, which that considered the instability and heterogeneity of VEC
makes V2V computation offloading performed in a secure, was developed in [20]. Moreover, some learning-based
reliable, and efficient manner. Besides, considering the methods were also proposed to solve the problem of vehicu-
dynamic vehicular environment, the policy of vehicular lar task offloading. In [21], a DRL-based resource allocation
task offloading should adapt to the state of the vehicular framework with multi-timescale was proposed by consider-
network, we thus utilize deep reinforcement learning (DRL) ing the constraints of vehicle mobility and task delay in
in vehicular task allocation, where the policy of task offload- VEC. In [1], authors designed an imitation learning-enabled
ing is updated in real time by observing the state of the online task scheduling algorithm for VEC. Since executing
vehicular network. The main contributions of this paper are offloading tasks occupies a vehicle’s onboard computing
summarized as follows: resources and consumes the vehicle’s energy, some vehicles
may not be willing to execute offloading tasks. To motivate
 We propose a blockchain-enabled VEC framework, vehicles to contribute their idle computing resources, some
where BSs are utilized to maintain a consortium incentive mechanisms were proposed in recent works. For
blockchain. The blockchain is responsible for evalu- example, authors in [22] designed a contract-based incen-
ating the reliability of vehicles according to historical tive mechanism that combined resource contribution and
transactions and further motivating vehicles to make resource utilization, while in [23], a timeliness-aware incen-
appropriate resource allocation. Moreover, the smart tive mechanism was proposed to stimulate the participation
contract in blockchain facilitates the sharing of vehic- of vehicles with the consideration of the vehicle’s uncertain
ular computing resources on demand by automati- travel time.
cally running the vehicular computation offloading All of the works mentioned above treated the service pro-
algorithm, and guarantees the security and reliability viders as trustworthy devices and did not consider the secu-
of task offloading. rity and privacy of task offloading in VEC. In this work, we
 We develop a smart contract-based vehicular task allo- exploit blockchain technology to guarantee security and
cation scheme, where dynamic pricing and DRL are reliability in vehicular computation offloading.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3884 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023

2.2 Blockchain for Vehicular Networks TABLE 1


Recently, the integration of blockchain and vehicular net- Summary of Key Notations
works has been studied in some works, where blockchain is
Notation Description
utilized to ensure the security of vehicular task offloading. In
[24], blockchain and smart contract were employed to facili- a Cost of signing a block/transaction
tate fair task offloading and mitigate security attacks in VEC. b Cost of verifying a signature
c Cost of generating/verifying a MAC
In [25], consortium blockchain was employed in a vehicular Nc Number of consensus nodes
computing resource trading system to guarantee transaction f Maximum number of faulty nodes
security and privacy protection in a distributed manner. Sbk ; S_bk Block size, maximum block size
Authors in [26] proposed a resource transaction architecture Mt Number of transactions in a block
based on blockchain to provide resource preallocation service x Average transaction size
for vehicles and eliminate reliance on third-party platforms. TI Block interval
TV Time of block verification
Moreover, to ensure the reliability of vehicles in task offload-
TF Time-to-finality (TTF)
ing, some works employed blockchain to store the social repu- fn The nth offloading task
tation of vehicles which is used for offloading. In [27], RSUs Dn Data size of task fn
allocated computational tasks for nearby fog vehicles based Cn Computation size of task fn
on repute scores maintained by a semi-private consortium tn Maximum delay of task fn
blockchain. In [15], a lightweight blockchain-based resource tn Offloading delay of task fn
sharing scheme was proposed in VEC, where the reputation fn Allocated computing resources for task fn
pn Service price of task fn
value was designed to indicate the trustworthy degree of
Un Utility of fn
vehicles in task offloading. Besides, to optimize the perfor- Vs The sth service vehicle
mance of blockchain-based VEC systems, some works also Fs Available computing resources in Vs
proposed various consensus algorithms to improve the effi- hs Reliability of Vs
ciency of blockchain consensus. Authors in [18] exploited xsn Decision variable of whether fn is offloaded to Vs
Byzantine fault tolerance-based Proof-of-Stake consensus to Vt Task vehicle
provide fast and deterministic block finalization, and pro- D Communication range of a vehicle
Un Utility of task vehicle for offloading fn
vided a contract-based incentive mechanism to motivate Ib The bth BS
vehicles to share their computing resources. Gb Available computing resources in Ib
However, few of the works mentioned above investigated ib Reliability of Ib
the dynamic trust management for both BSs and vehicles in yb Decision variable of whether Ib is consensus node
VEC. In this paper, we propose a blockchain-enabled VEC Ntr Number of transactions generated in a block interval
framework that considers the reliability of both BSs and ser-
vice vehicles in task allocation and task execution. We further
develop a DRL-based algorithm combined with dynamic
pricing for vehicular computation offloading, and design an
enhanced consensus protocol to improve the performance of
blockchain.

3 BLOCKCHAIN-ENABLED VEC
In this section, we will demonstrate the architecture of our
proposed blockchain-enabled VEC system in Sections 3.1
and 3.2, and further describe our proposed smart contract
for vehicular computation offloading and the consensus
scheme for the blockchain-enabled VEC in Sections 3.3 and
3.4, respectively. For ease of reference, the key notations in
this paper are summarized in Table 1.

3.1 VEC System Model


As shown in Fig. 1, we consider a VEC scenario where mul-
tiple BSs are distributed alongside the roads, and each BS is
equipped with an edge server that is responsible for com-
puting resource allocation among vehicles. When a vehicle
enters the communication range of a BS and is willing to Fig. 1. The architecture of the blockchain-enabled VEC system.
participate in the VEC, it first sends a registration request
message that includes the ID, current position, velocity, and offloading requests increasing, the BS cannot handle a large
available onboard computing resources to the BS. When a number of tasks simultaneously due to the limited comput-
vehicle cannot execute all of its local tasks due to the limited ing capability, some computing tasks should be offloaded to
onboard computing resources, it can offload its computing neighboring vehicles with idle computing resources. In this
tasks to either BSs or neighboring vehicles that drive in the work, we will mainly focus on the problem of V2V compu-
same direction. Considering that with the number of task tation offloading in VEC.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3885

Since the position of BSs is stationary and BSs serve traceability of V2V computation offloading in blockchain
nearby vehicles all the time, BSs can be used to collect vehic- makes it possible for guaranteeing the security and reliability
ular computing service requests, observe the states of the of vehicles and BSs. Specifically, when a BS performs the
vehicles inside the communication range, and allocate vehic- vehicular task allocation, it first collects the information of
ular computing tasks to vehicles with idle computing resour- vehicles in the communication range of the BS and finds the
ces. If a vehicle needs to offload part of its computing tasks reliability of the vehicles by looking up the table of reliability
due to the limited onboard computing resources, it first values stored in the BS with vehicle IDs, and then selects the
sends a service request to the nearby BS, and we call this service vehicle and service price by a DRL-based task alloca-
vehicle as the task vehicle, and call the vehicles in the com- tion algorithm. The table of reliability values is periodically
munication range of the task vehicle as the service vehicles. updated according to the verified transactions in terms of
Then, the BS performs the vehicular task allocation and vehicular task offloading. In addition, to motivate BSs to per-
selects some of the service vehicles to execute the computing form vehicular task allocation appropriately, we also evalu-
tasks from the task vehicle. During the process of vehicular ate the reliability of BSs according to the performance in
task allocation, the BS can collect massive experiences vehicular task allocation and regard the reliability of BSs as a
in terms of vehicular task offloading which can be used basis in the consensus nodes selection.
for training the DRL-based algorithm of vehicular task allo-
cation. Moreover, to motivate service vehicles to contribute 3.3 Smart Contract
their idle computing resources, the task vehicle pays a ser-
In the blockchain-enabled VEC, the smart contract is a pre-
vice price for each offloading task, and the selected service
defined script [10], which is utilized to perform the comput-
vehicle allocates part of its computing resources for the off-
ing task allocation among vehicles and record the events of
loading task according to the service price. Furthermore, BS
task offloading automatically. To incentivize vehicles to
can use the pre-trained DRL model and perform the algo-
share their idle computing resources, a dynamic pricing
rithm of task allocation to dynamically select the service
scheme is exploited in vehicular task offloading, where ser-
vehicles and the service price according to the states of
vice vehicles allocate their idle computing resources accord-
vehicles in the communication range of the BS.
ing to the received service price. Furthermore, a DRL-based
algorithm of vehicular task allocation is developed to
3.2 Blockchain Model dynamically select the service vehicle and determine the
service price for each offloading task from task vehicles.
Due to the high dynamic vehicular environment, a task
Once a task vehicle sends a task offloading request to the
vehicle driving on the road may have different neighboring
BS, the smart contract in the BS processes the offloading
service vehicles in different time slots, and the task vehicle
request and triggers the DRL-based task allocation algo-
may not obtain much information from all of the neighbor-
rithm deployed in the BS. After the task is completed by
ing vehicles in real time, which makes it difficult to evaluate
the service vehicle, the BS verifies the execution result. If the
the willingness of neighboring vehicles to contribute their
result is valid and the offloading delay does not exceed the
computing resources. In some cases, a service vehicle agrees
deadline of the task, the task vehicle pays the service price
to accept the service request from the task vehicle, but the
to the service vehicle, and a transaction that records the task
computing resources that the service vehicle allocates for
offloading event is generated in the BS. The detailed algo-
the offloading task cannot ensure the task is completed
rithm of task allocation is presented in Section 4.
within the deadline, which leads to an offloading failure.
Therefore, it is necessary to construct an authentication
mechanism for vehicles in VEC. Blockchain is deemed as a 3.4 Consensus Process
promising solution, where the historical transactions of After each offloading task is completed, the offloading
vehicular task offloading are stored and utilized to evaluate information is recorded as a transaction in the blockchain
the reliability of vehicles in resource allocation. Addition- system, which includes the IDs of the task vehicle and ser-
ally, considering that public blockchain requires massive vice vehicle, the task profile, the offloading delay, the execu-
computation and time cost in the consensus process, which tion result of the task, and the timestamp. In each round of
is not suitable for vehicular networks, we thus employ a blockchain consensus, newly generated transactions are
lightweight consortium blockchain in the VEC system, verified by blockchain nodes and permanently stored in
where only a part of BSs are authorized to participate in the blocks, and the verified transactions can be used for evaluat-
blockchain consensus. Compared to the public blockchain, ing the reliability of service vehicles and BSs. Considering
there are fewer consensus nodes in the consortium block- that the accuracy of the estimated reliability depends on the
chain, and the computation complexity of blockchain con- timeliness of the transactions, the delay of transaction verifi-
sensus is lower. cation should be optimized in the blockchain consensus. We
In the blockchain-enabled VEC, considering the fast- thus propose a consensus algorithm to improve the effi-
changing situation in vehicular networks due to the mobility ciency of transaction verification. In the following, we will
of vehicles, the blockchain in the VEC system is maintained demonstrate how to implement the consensus in the block-
by stationary BSs, and the transactions in terms of vehicular chain system.
task offloading are recorded by BSs as well. Moreover, the Due to the high dynamic nature of vehicular networks, the
transactions are grouped and stored in a persistent, immuta- consensus delay of the blockchain should be short enough so
ble, and tamper-proof ledger, and are verified by the consen- that the information of vehicular computation offloading can
sus before attaching to the blockchain. Furthermore, the be updated in time. Some existing consensus protocols, such
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3886 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023

as Proof-of-Work (PoW), Proof-of-Stake (PoS), suffer from a MACs from the Mt transactions. The computation cost of
long time in the consensus process, which is not suitable for primary node is Mt ðb þ cÞ.
the VEC scenario. Therefore, in the consensus of the block-
chain-enabled VEC, we employ the PBFT protocol [28], [29], 3.4.3 Pre-Prepare
which has been widely applied in consortium blockchains In this step, the primary node generates one signature and
and can achieve less consensus latency than PoW and PoS. In one MAC for the unvalidated block and generates Nc  1
the traditional PBFT consensus algorithm [30], the consensus MACs for the pre-prepare message. The pre-prepare mes-
nodes are predetermined and the primary node is selected sage that contains the block is then sent to other replica
from the consensus nodes in round-robin order. Since the con- nodes. Each Replica node verifies one MAC from the block,
sensus performance is influenced by the number of consensus and Mt MACs and Mt signatures from the transactions in the
nodes, we then propose an enhanced consensus scheme based block. Then, the computation cost of the primary node is a þ
on PBFT to optimize the performance of blockchain consen- Nc c, and the computation cost of each replica node is c þ
sus, where the primary node and replica nodes are dynami- Mt ðb þ cÞ. In addition, without loss of generality, we assume
cally selected from BSs in a certain area according to the state that the transmission time of a block is proportional to the
of BSs, and the block size is determined according to the state block size. Then, the time of message delivery is t bk Sbk .
of BSs as well. In the consensus scheme, the primary node
and replica node are responsible for block generation and
3.4.4 Prepare
block verification, respectively, and the unselected BSs act as
the normal nodes, which only record task offloading results Once verifying the pre-prepare message, replica nodes send
as transactions in the blockchain and do not participate in the the prepare message to other consensus nodes. After each
block generation and verification. Additionally, the block- replica node collects 2f prepare messages matched with the
chain consensus employs signature and message authentica- pre-prepare message, the consensus enters the next step. In
tion code (MAC) to ensure the integrity and authentication of this step, the primary node needs to verify 2f MACs, every
a transaction. In the consensus, signing a block or transaction, replica node needs to generate Nc  1 MACs and verify 2f
verifying a signature, generating a MAC, and verifying a MACs. Therefore, the computation cost of the primary node
MAC require a, b, c, c CPU cycles [31], [32], respectively. is 2fc, and the computation cost of each replica node is
Moreover, to guarantee the security of the blockchain system, ðNc  1 þ 2fÞc. The time of message delivery is t bk Sbk .
the consensus allows at most f ¼ bðNc  1Þ=3c consensus
nodes are faulty [29]. Finally, the main steps of the consensus 3.4.5 Commit
are shown as follows. Upon receipt of 2f matching prepare messages, each con-
sensus node sends a commit message to the other consensus
nodes. In this step, the primary node and each replica node
3.4.1 Consensus Nodes Selection verify 2f MACs and generate Nc  1 MACs. The computa-
At the beginning of each round of consensus, the primary tion cost of the primary node and each replica node is ðNc 
node in the previous round of consensus performs the algo- 1 þ 2fÞc. The time of message delivery is t bk Sbk .
rithm of consensus nodes selection to select one primary
node and multiple replica nodes from BSs in a given area, 3.4.6 Reply
and the other BSs in the area are the normal nodes. The Upon receipt of 2f matching commit messages, replica
available computing resources for the consensus in the BS nodes accept the block as valid and append the verified
and the reliability of the BS in vehicular task allocation are block to its local ledger, and send the reply message con-
the two main factors that affect the consensus nodes selec- taining the validated block to other BSs. Then, each BS
tion. In other words, the BS with higher reliability and more updates its global view from the verified block after receiv-
computing resources will have a higher probability of being ing f þ 1 matching reply messages. In this step, the primary
selected as a consensus node. In each round of consensus, node and each replica node generate one MAC for the reply
each consensus node will gain a reward for the contribution message, the computing cost of each node is c. The time of
to the blockchain consensus. Therefore, to gain more message delivery is t bk Sbk .
rewards from the blockchain, each BS tends to improve its The main process of consensus is shown in Fig. 2. Finally,
reliability in vehicular task allocation, which prevents some the computation time of the primary node is
malicious BSs from giving inappropriate or unfair policies
in vehicular task allocation. The detailed algorithm of con- Mt ðb þ cÞ þ a þ ð2Nc þ 4fÞc
sensus nodes selection is presented in Section 5. TpV ¼ ; (1)
Gp

where Gp is the available computing resources in the pri-


3.4.2 Collect
mary node. Similarly, the computation time of a replica
At the beginning of a block interval, the primary node first node is
collects unverified transactions from all BSs and sorts the
transactions by timestamps. Then, the primary node forms Mt ðb þ cÞ þ ð2Nc þ 4fÞc
the first Mt transactions as an unvalidated block, and the TrV ¼ : (2)
Gr
size of the block is denoted as Sbk . We assume that the aver-
age size of transactions is x, then we have Mt ¼ bSbk =xc. Similar to [33], we use time to finality (TTF) to represent the
Then, the primary node verifies Mt signatures and Mt latency of the consensus process, which is shown as
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3887

Fig. 2. The main process of the PBFT-based consensus.

TF ¼ TI þ TD þ TV ; (3)

where T I , T D , and T V denote the block interval, time of


message delivery, and time of verification, respectively.
According to the above analysis, we have T D ¼ 4t bk Sbk , and
T V ¼ maxfTpV ; TrV g. Moreover, in order to prevent newly
generated blocks suffering from long time of verification,
the TTF should satisfy the following constraint: Fig. 3. The main process of the blockchain-enabled VEC system.

T F  dT I ; (4)
the communication range of a BS, then the relationship
where d > 1 is a constant. In (4), T contains block generat-
F between the average velocity of vehicles v and traffic den-
ing interval T I and the time of block verification, and T F  sity r is
dT I ensures that the time of block verification should be less  
than a threshold so that newly generated blocks can be veri- r
v ¼ vmax 1  ; (5)
fied in each round of consensus, and the threshold is deter- rmax
mined by T I .
where vmax and rmax are the maximum velocity of vehicles
3.5 Process of Blockchain-Enabled VEC and maximum traffic density, respectively.
The main process of our proposed blockchain-enabled VEC
system is presented in Fig. 3. In the VEC system, a task vehi- 4.1 Computation Model
cle that has several offloading tasks first sends the offloading All of the offloading tasks in the task vehicle are assumed to
request to the nearby BS, then the smart contract in the BS is be computation-intensive and are offloaded independently
triggered. The BS collects the information of the vehicles sur- in VEC. The profile of an offloading task is denoted as
rounding the task vehicle and evaluates the reliability of the fDn ; Cn ; t n g, where Dn is the data size of the task, Cn is the
vehicles according to the transactions stored in the consor- computation size which represents the required CPU cycles
tium blockchain, and then selects a service vehicle for task for completing the task, t n is the maximum tolerable delay
offloading. After the V2V computation offloading is finished, of the task. In many cases, the execution result of a task is
a transaction that records the result of the computation off- much smaller than data size Dn , we thus ignore the trans-
loading is generated and verified by the blockchain consen- mission time of execution result in the calculation of off-
sus. In each round of consensus, the consensus nodes are loading delay. Then, the offloading delay of a task is
selected from the BSs in the VEC system and then perform
the PBFT-based consensus. Finally, the verified transactions Dn Cn
tn ¼ þ ; (6)
are stored in the blockchain system and can be further used rts fn
for evaluating the reliability of vehicles and BSs.
where rts is the transmission rate of the V2V link between
task vehicle Vt and service vehicle Vs , fn is the allocated
4 SMART CONTRACT FOR V2V COMPUTATION computing resources for the offloading task in the service
OFFLOADING vehicle. Similar to [35], we utilize the channel capacity to
In this section, we will detail our proposed smart contract estimate the V2V transmission rate, which can be given as
for vehicular task allocation in the blockchain-enabled VEC  
system. In the system, the problem of vehicular task alloca- Pt da 2
ts hts
rts ¼ Wts log 2 1 þ 2 ; (7)
tion in BS is a sequential decision problem, which can be s w þ Its
formulated as a Markov decision process (MDP). To solve
the problem, we develop a DRL-based algorithm of vehicu- where Wts is the allocated bandwidth of the V2V link
lar task allocation to improve the efficiency and the reliabil- between Vt and Vs , Pt is the transmission power of Vt , dts is
ity of vehicular computation offloading. the distance between Vt and Vs , a is a constant which repre-
We first consider the VEC scenario illustrated in Fig. 1, sents the path loss factor, hts is the channel gain of the V2V
where multiple vehicles drive on a one-way road, and sev- link, s 2w represents the power spectrum density of additive
eral BSs are distributed at the roadside. Similar to [34], we white Gaussian noise, and Its denotes the interference intro-
utilize a free flow model to characterize the traffic model in duced by other V2V transmissions.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3888 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023

Particularly, due to the short V2V link duration and lim- establish a reliability model to evaluate the efficiency of task
ited vehicular computing capability, we do not consider the offloading in a service vehicle.
queuing time in the task offloading, i.e., if a V2V link inter- We first define a normalized utility for an offloading task
ruption occurs in the process of task offloading, the task off- that can be completed within the deadline, which is shown
loading will be regarded as a failure, and the utility of the as follows:
task will be negative as a penalty. Then, the task vehicle
log ð1 þ t n  tn Þ
needs to resubmit a new service request to BS for the task. U~n ¼ ; (12)
log ð1 þ t n Þ
4.2 Task Vehicle Model
where log ð1 þ t n Þ represents the maximum utility of task
When a task vehicle offloads a task to a service vehicle, it fn . After a service vehicle completes an offloading task, the
needs to pay the service vehicle the service price, and we computation efficiency of the service vehicle is updated by
assume that the service price of performing a computing
task in a service vehicle is proportional to the computation s ¼ ð1  v1 Þs0 þ v1 U~n ; v1 2 ð0; 1Þ; (13)
size of the task, which is represented as pn Cn , where pn
denotes the unit price. s0
where represents the previous computation efficiency of
For a computing task fn , similar to [36], [37], the utility is Vs . Besides, the completion ratio of a service vehicle is
related to the completion time of the task, which is shown as updated by

log ð1 þ t n  tn Þ; tn  t n ; Ns  z0s þ m1ðtn  t n Þ
Un ¼ (8) zs ¼ ; (14)
L; tn > t n ; Ns þ 1

where L < 0 is a constant that represents the penalty of where Ns represents the total number of received offloading
offloading failure. Then, the utility of Vt for offloading a tasks in Vs , z0s denotes the previous completion ratio of Vs ,
task can be given as and m1ðÞ is the indicator function. Then, the reliability of a
service vehicle can be defined as
U n ¼ Un  pn Cn : (9)
hs ¼ v2 s þ ð1  v2 Þzs ; v2 2 ð0; 1Þ: (15)
As a result, if a service vehicle cannot allocate appropriate
4.3 Service Vehicle Model
computing resources for an offloading task according to the
In the process of task offloading, once a service vehicle service price, the reliability of the service vehicle will be
receives a service request and corresponding service price, reduced. In our proposed V2V computation offloading algo-
the service vehicle will contribute part of its computing rithm, the reliability of service vehicles is an important fac-
resources for the offloading task. We consider a service tor that affects the selection of service vehicles and service
vehicle Vs , and assume that the cost of executing a comput- price, a vehicle with low reliability usually has a low proba-
ing task is proportional to the energy consumption, and bility of being selected as the service vehicle for offloading
denote the computing resources allocated for task fn as fn . tasks.
Similar to [38], the energy consumption of executing task fn
is calculated by 4.4 Problem Formulation: Vehicular Task Allocation
Cn In the VEC system, we consider a task vehicle which is
En ¼ kfn3 ¼ kfn2 Cn ; (10) denoted as Vt . The service vehicles in the communication
fn
range of Vt are denoted as S ¼ fV1 ; . . . ; Vs ; . . . ; VS g. When Vt
where k > 0 is a constant. Then, the utility of Vs for execut- sends a task offloading request to a BS, the smart contract in
ing task fn can be given as the BS is automatically triggered. We assume that there are
N offloading tasks from Vt in a period, which are denoted as
Uns ¼ pn Cn  $kfn2 Cn ; (11) N ¼ ff1 ; . . . ; fn ; . . . ; fN g. For each offloading task, the BS
dynamically chooses the service vehicle and determines the
where $ is the coefficient that represents the price per unit corresponding service price by observing the vehicular envi-
of energy consumption. ronment. After the task is completed by the service vehicle,
Since the utility of a service vehicle
pffiffiffiffiffiffiffiffiffimust
ffi be non-nega- the execution result is transmitted to Vt and the BS, and then
tive, the range of fn is ½0; minfFs ; pn =kg, where p is ffithe
Fsffiffiffiffiffiffiffiffiffi the BS verifies the result and sends the validity of the result
maximum available computing resources in Vs , pn =k is to Vt . If the execution result is valid and the offloading delay
the upper bound of fn obtained from (11) with the consider- is within the maximum tolerable delay of the task, Vt will
ation of the non-negative utility of service vehicle. More- pay the service price to the service vehicle. Otherwise, the
over, it can be seen from (11) that the utility of a service service vehicle will not obtain any payment from Vt . Then,
vehicle mainly depends on the unit service price and the the BS generates a transaction that records the task offload-
allocated computing resources for task fn . A service vehicle ing event. In the V2V computation offloading, we formulate
can increase the utility by allocating fewer computing the problem of vehicular task allocation in the BS as an opti-
resources for task fn , but it will reduce the utility of task fn mization problem with the objective of maximizing the util-
or even make the task not completed within the deadline. ity of Vt with the consideration of the mobility, the
To prevent some malicious vehicles from allocating insuffi- computing capability, and the reliability of vehicles. The
cient computing resources for offloading tasks, we further optimization problem is formulated as follows:
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3889

1X N X S
4.4.3 Reward Function
max xs U n ;
A N n¼1 s¼1 n According to the optimization objective presented in (16),
we define the immediate reward after conducting action at
s.t. C1 : xsn 2 f0; 1g; 8n 2 N ; s 2 S;
at time t as
X
S
C2 : xsn ¼ 1; 8n 2 N;
X
S
s¼1
Rt ¼ xsn U n ; (20)
C3 : xsn ðls  tn Þ  0; 8n 2 N ; s 2 S; s¼1
C4 : xsn ðhs  h0 Þ  0; 8n 2 N ; s 2 S; (16)
where U n is the utility of Vt for offloading a task as
where A ¼ fa1 ; . . . ; an ; . . . ; aN g, and an ¼ fx1n ; . . . ; xsn ; . . . xSn ; defined in (9), which is related to service vehicle and ser-
pn g. In constraint C1, xsn ¼ 1 means that task fn is offloaded vice price.
to Vs , and xsn ¼ 0 otherwise. Constraint C2 guarantees that a
computing task is only offloaded to one service vehicle. Con- 4.5 SAC-Based Computation Offloading Algorithm
straint C3 ensures that the completion time of an offloading In the problem of vehicular task allocation, the action in
task should be less than the V2V link duration between Vt terms of the selection of service vehicles and the determina-
and Vs , and the V2V link duration is estimated by tion of service price contains both discrete decision varia-
bles and continuous variable, we thus employ soft actor-
D zt  zs
ls ¼  ; (17) critic (SAC) [39] method to obtain the hybrid decision policy
jvt  vs j vt  vs
in vehicular task allocation. SAC is a reinforcement learning
algorithm which is based on off-policy actor-critic model.
where zt , vt represent the position and velocity of task vehi- To be specific, the actor network in SAC is used to generate
cle, respectively, and zs , vs represent the position and veloc- action according to the observed system state and improve
ity of service vehicle, respectively. Constraint C4 ensures the policy, while the critic network is responsible for evalu-
that the reliability of the selected service vehicle is higher ating the policy provided by the actor network. In the fol-
than threshold h0 . lowing, we will demonstrate how SAC works in task
In dynamic vehicular networks, the V2V link state, the allocation.
idle computing resources, and the reliability of service
vehicles are all time-variant, and the multi-task allocation
can be regarded as a sequential decision problem. We refor- 4.5.1 Policy and Value Function
mulate the optimization problem in (16) as an MDP, and the The main goal of SAC is to find an optimal policy that maxi-
system state space, action space, and reward are defined as mizes the expected long-term reward, and meanwhile maxi-
follows. mizes the entropy of policy to improve the robustness and
exploration. The policy is defined as an action-selection
4.4.1 State Space strategy which can be either deterministic or stochastic. In
SAC, the policy is stochastic and is denoted as pðajsÞ, which
The system state at time t can be defined as
represents a probability distribution over actions given an
observed state s. We then define a soft action-value function
st ¼ ½F1 ðtÞ; . . . ; Fs ðtÞ; . . . ; FS ðtÞ; g 1 ðtÞ; . . . ; g s ðtÞ; . . . ; g S ðtÞ;
that represents the expected long-term reward combined
l1 ðtÞ; . . . ; ls ðtÞ; . . . ; lS ðtÞ; h1 ðtÞ; . . . ; hs ðtÞ; . . . ; hS ðtÞ; with the expected entropy of policy p when given an initial
DðtÞ; CðtÞ; tðtÞ; (18) state s and action a, which is shown as
" #
where Fs ðtÞ represents the available computing resources in X
T 1 X
T 1
Qp ðs; aÞ ¼ E nt Rt þb nt Hðpðjst ÞÞjs0 ¼ s; a0 ¼ a ; (21)
Vs , g s ðtÞ is the signal-to-noise ratio (SNR) of the V2V link t¼0 t¼1
between Vt and Vs , ls ðtÞ denotes the estimated V2V link
duration between Vt and Vs , hs ðtÞ is the reliability of Vs , where b 2 ½0; 1 is the temperature parameter that controls
which is estimated by the historical transactions in terms of the stochasticity of the policy, and n is the discount factor.
task offloading in the blockchain according to (15), DðtÞ, Further, we define a soft state-value function that denotes
CðtÞ, and tðtÞ are the data size, computation size, and dead- the expected long-term reward combined with the entropy
line of the offloading task at time t, respectively. of policy starting from state s and taking actions following
some policy p, which is given as follows:
4.4.2 Action Space " #
The conducted action at time t can be given as X
T 1
V ðsÞ ¼ E
p
n ðRt þ bH ðpðjst ÞÞÞjs0 ¼ s :
t
(22)
t¼0
at ¼ ½x1n ðtÞ; . . . ; xsn ðtÞ; . . . ; xSn ðtÞ; pn ðtÞ; (19)
Since the dimensions of both state space and action space
where xsn ðtÞ ¼ 1 represents that service vehicle Vs is selected could be extremely large with the increase of traffic density,
to execute task fn , and xsn ðtÞ ¼ 0 otherwise, and pn ðtÞ we employ neural networks to approximate both the soft
denotes the unit service price of fn paid to the selected ser- action-value function and the policy, which are denoted as
vice vehicle. Qu ðs; aÞ and pc ðajsÞ, respectively.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3890 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023

4.5.2 Critic Network Training Algorithm 1. SAC-Based Task Allocation Algorithm


In our algorithm, the critic network is in charge of policy Initialize: Initialize critic networks Qu1 ðs; aÞ and Qu2 ðs; aÞ
evaluation. In each step, the agent of BS samples a batch of with weights u1 and u2 , respectively.
experiences from replay buffer B, and then trains the critic Initialize target critic networks Q ^^ ðs; aÞ and Q
^^ ðs; aÞ with
u1 u2
network. Specifically, the policy is measured by the mean ^ ^
weights u1 and u2 , respectively.
square error (MSE) between the soft action-value function Initialize actor network pc ðajsÞ with weights c.
and the target soft action-value function, and the loss func- Initialize replay buffer B ¼ ;.
tion is given as follows: for each period do
for each task fn do
  2 The BS collects information of vehicles and evaluates cur-
1 ^^ ðst ; at Þ
LQ ðuÞ ¼ Eðst ;at Þ B Qu ðst ; at Þ  Q ; (23) rent state st .
2 u
Generates action at that determines service vehicle Vs and
unit price pn .
where the target soft action value function is defined as Sends a task allocation message to Vt and Vs .
Vs allocates computing resources for task fn .
^^ ðst ; at Þ ¼ Rt þ nEs
Q rp V^u ðstþ1 Þ : (24) The BS calculates the immediate reward Rt and estimates
u tþ1
the next state stþ1 .
Store ðst ; at ; Rt ; stþ1 Þ into replay buffer B.
Then, the critic network is trained by minimizing the loss end for
function in (23) in each step. Sample a batch of experiences E from B.
Update weights of Qu1 ðs; aÞ and Qu2 ðs; aÞ by computing the
gradient of LQ ðui Þ in (23),
4.5.3 Actor Network Training
The actor network is in charge of generating actions by the 1 X
ui ¼ ui  Q rui LQ ðui Þ; i ¼ 1; 2:
observed states. According to [39], the actor network is jEj jEj
trained by minimizing the Kullback-Leibler (KL) diver-
gence, which is presented as follows: Update weights of pc ðajsÞ by computing the gradient of
! Lp ðcÞ in (26),
0
expðb1 Qp ðst ; ÞÞ
pt ¼ arg min DKL p ðjst Þ k ; (25) 1 X
p0 2P Z p ðst Þ c ¼ c  p r c Lp ðcÞ:
jEj jEj
where Z p ðst Þ is the function that normalizes the distribu-
tion, which does not affect the gradient of the policy. We Update b by computing the gradient of Lt ðbÞ in (27),
transform the expression of pt in (25) by multiplying b and
ignoring the term Z p ðst Þ, and then obtain the loss function 1 X
b ¼ b  b rb Lt ðbÞ:
of policy, which is given as follows: jEj jEj
h i
Lp ðcÞ ¼ Est B Eat pc blog pc ðat jst Þ  Qu ðst ; at Þ : (26) Update weights of Q ^^ ðs; aÞ by
^^ ðs; aÞ and Q
u1 u2

^ui ¼ vs ^ui þ ð1  vs Þui ; i ¼ 1; 2:


In each step, the actor network is trained by minimizing the
loss function in (26). end for

4.5.4 Temperature Adjustment


In the above descriptions of the training process of actor 5 DRL-BASED CONSENSUS NODES SELECTION
and critic networks, temperature parameter b is regarded In our proposed V2V computation offloading scheme, the
as a constant. According to [40], the temperature param- reliability of service vehicles in resource allocation is a main
eter can be learned in each step by minimizing the loss factor that affects the decision in vehicular task allocation.
function In the blockchain-enabled VEC, the reliability of a vehi-
cle in resource allocation is evaluated by the transactions
Lt ðbÞ ¼ Eat p ½blog pðst ; at Þ  bH0 ; (27) of vehicular computation offloading, and the accuracy of
the estimated reliability highly depends on the timeliness
where H0 is the entropy threshold. of the transactions. Therefore, it is necessary to design a
Moreover, to mitigate the positive bias in policy consensus scheme to optimize the efficiency of transac-
improvement, two critic networks are established in SAC tion verification.
[40], and both of which are trained independently. Then in In this section, we propose a consensus nodes selection
the calculation of the stochastic gradient of LQ ðuÞ and the algorithm for the PBFT-based consensus in the blockchain-
policy gradient of Lp ðcÞ, we use the minimum soft Q-value enabled VEC system, where the consensus nodes and the
of the two critic networks. block size in each round of blockchain consensus are both
Finally, the detailed vehicular task allocation algorithm is determined according to the states of BSs. To maximize the
presented in Algorithm 1. efficiency of blockchain consensus and meanwhile motivate
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3891

BSs to improve their reliability in vehicular task allocation, 5.2 Problem Formulation: Consensus Nodes
we model the consensus nodes and incorporate the incen- Selection
tive mechanism into the reward of consensus nodes in Our aim is to maximize the expected long-term reward of con-
Section 5.1. Besides, considering that the available com- sensus nodes. According to the reward of consensus nodes
puting resources in BSs for consensus and the reliability defined in (30), we formulate the optimization problem as
of BSs are both time-variant, we employ a DRL-based
algorithm in Section 5.3 so that the policy of consensus X
T
0
nodes selection can dynamically adapt to the state of the max t t Rðt0 Þ;
A0
VEC system. t0 ¼t
s.t. C1 : yb ðt0 Þ 2 f0; 1g; 8b 2 I;
5.1 Consensus Node Model X
B
In the blockchain-enabled VEC system, we assume that C2 : yb ðt0 Þ ¼ Nc ;
b¼1
there are B BSs distributed in a certain area, which are
denoted as I ¼ fI1 ; . . . ; Ib ; . . . ; IB g. In each round of block- C3 : yb ðt0 ÞðGb  GÞ  0; 8b 2 I
chain consensus, all the consensus nodes are selected by the C4 : yb ðt0 Þðib ðt0 Þ  LÞ  0; 8b 2 I
primary node of the previous round of consensus. Accord- C5 : T  dT  0;
F I
(31)
ing to the analysis of blockchain consensus presented in Sec-
tion 3.4, the performance of consensus mainly depends on
the block size and available computing resources in BSs. where A0 ¼ fy1 ðt0 Þ; . . . ; yb ðt0 Þ; . . . ; yB ðt0 Þ; Sbk ðt0 Þg, and  2
Considering that with the traffic density in the communica- ð0; 1 is the reward discount factor. In constraint C1, yb ðt0 Þ ¼
tion range of a BS increasing, there will be more vehicles 1 indicates that BS Ib is selected as a consensus node at time
sending computing service requests to the BS, and without t0 , and yb ðt0 Þ ¼ 0 otherwise. Constraint C2 indicates that the
loss of generality, we assume that the arrival rate of vehicu- total number of consensus nodes is Nc . Constraint C3 guar-
lar service requests in a BS is proportional to the traffic den- antees that the available computing resources in a consen-
sity in the communication range of the BS. Then, the sus node should be higher than threshold G. Constraint C4
available computing resources for blockchain consensus in ensures that the reliability of a consensus node should be
the BS can be given as higher than threshold L. In constraint C5, the consensus
delay should satisfy the constraint in (4).
~b þ mrb Þ; In the consensus nodes selection, the available comput-
Gb ¼ Gtotal
b  ðG (28)
ing resources for blockchain consensus in each BS depends
on the traffic density in the communication range of the BS,
where Gtotal is the total computing capability of BS Ib , G ~b
b the reliability of each BS depends on the performance in
represents the computing resources reserved for some non- vehicular task allocation, and both of which are time-vari-
vehicular computing tasks in Ib , rb is the traffic density in ant. Moreover, the consensus nodes selection is a joint deci-
the communication range of Ib , and m > 0 is a constant. Fol- sion-making problem, where the block size and multiple
lowing similar assumption of block reward in [38], we consensus nodes are jointly determined in each round of
define the reward of a consensus node after completing a consensus, which is difficult to be solved by traditional opti-
round of consensus as mization methods. In the following, we will transform the
(   problem as an MDP and demonstrate how to solve the prob-
"1 ib þ "2 Sxbk log ð1 þ T  TbV Þ; TbV  T ; lem by using a DRL-based method.
Rb ðtÞ ¼ (29)
G; TbV > T ;
5.2.1 State Space
In each period, the system state of all the BSs is denoted as
where ib is the reliability of BS Ib , "1 and "2 are the weights of
reliability and number of validated transactions, respec-
st ¼ ½G1 ðtÞ; . . . ; Gb ðtÞ; . . . ; GB ðtÞ;
tively, TbV denotes the delay of block verification in Ib , T ¼
dT I  ðT I þ T D Þ represents the maximum tolerable delay of i1 ðtÞ; . . . ; ib ðtÞ; . . . ; iB ðtÞ; Ntr ðtÞ; (32)
block verification in Ib , and G < 0 is a constant that repre-
sents the penalty. It can be seen from (29) that the reward of a where Gb ðtÞ is the available computing resources for block-
consensus node is simultaneously determined by the block chain consensus in BS Ib at time t, Ntr ðtÞ represents the num-
size and the reliability, the BS that validates more transac- ber of transactions generated by BSs in a block interval, ib ðtÞ
tions and has higher reliability will get more reward in the represents the reliability of Ib in vehicular task allocation,
consensus process. Therefore, BSs are motivated to improve which is evaluated by the average normalized utility
their reliability in vehicular task allocation and contribute defined in (12) and the completion ratio of the vehicular off-
more computing resources for blockchain consensus. After loading tasks allocated by the BS, and the calculation of ib ðtÞ
a round of consensus is completed, the average reward is the same as (15). It is worth noting that the reliability of a
obtained by the consensus nodes from blockchain is BS reflects the task allocation in the BS, which depends on
the selection of service vehicle and service price, while the
reliability of a service vehicle reflects the task execution in
1 X Nc
RðtÞ ¼ Rb ðtÞ: (30) the service vehicle, which depends on the number of allo-
Nc b¼1
cated computing resources allocated for offloading tasks.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3892 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023

5.2.2 Action Space main Q-network. In the target Q-network, the target Q-
We represent the action conducted by the agent at time t as value can be calculated as follows:

at ¼ ½y1 ðtÞ; . . . ; yb ðtÞ; . . . ; yB ðtÞ; Sbk ðtÞ; (33) z 0  


t ¼ Rt þ n Q ðstþ1 ; arg max Qðstþ1 ; a; uÞ; u Þ: (38)
a

where yb ðtÞ ¼ 1 indicates that BS Ib is selected as a consen- Since we aim to make the Q-value of the main Q-network
sus node, and yb ðtÞ ¼ 0 otherwise. Sbk ðtÞ denotes the size of close to the target Q-value, the loss function of the main Q-
unverified block, and Sbk ðtÞ 2 f0:2; 0:4; . . . ; S_ bk g, where S_bk network is defined as follows:
is the maximum block size in the blockchain. Furthermore,
jE 0 j
each consensus node has the potential to be selected as the 1 X  2
primary node, and the other consensus nodes are the replica JðuÞ ¼ 0 z  Qðsi ; ai ; uÞ : (39)
jE j i¼1 i
nodes. Among the consensus nodes chosen from at , we
select the consensus node that maximizes RðtÞ as the pri- In each period, the agent samples a batch of experiences E 0
mary node. from the replay buffer, and trains the main Q-network by
minimizing the loss function in (39). Finally, after every N 
5.2.3 Reward Function steps, the weights of the target Q-network u is updated
According to the optimization objective defined in (31), the with an exponential moving average of u. The detailed algo-
immediate reward in a round of consensus can be given as rithm is presented in Algorithm 2.

Rðst ; at Þ ¼ RðtÞ: (34) Algorithm 2. Consensus Nodes Selection Algorithm


Initialize: Initialize main Q-network Qðs; a; uÞ and target Q-
5.3 Consensus Nodes Selection Algorithm network Q ðs; a; u Þ with random weights u and u ,
In the problem of consensus nodes selection, the selection of respectively.
consensus nodes are discrete decision variables, and the Initialize replay buffer B0 ¼ ;.
block size is determined by the number of transactions in a for each period do
block, which is also a discrete decision variable. Therefore, Primary node evaluates current state st according to the
the action space is discrete. We thus employ double deep Q- computing resources and reliability of each BS.
Generate a random probability p.
network (DDQN) [41], which is suitable to solve discrete
if p < , then
action problems and can guarantee global optima. The aim
Randomly select action at from the action space.
of our algorithm is to find an optimal policy that maximizes
else
the expected long-term reward of the consensus nodes. We Choose action at ¼ arg maxa Qðst ; a; uÞ:
first define a Q-value function Qp ðst ; at Þ to represent the end if
expected long-term reward after performing action at at Broadcast the message that contains the consensus nodes
state st according to some policy p, which is shown as selection and block size.
Store ðst ; at ; Rt ; stþ1 Þ into replay buffer B0 .
Qp ðst ; at Þ ¼ E½Rt þ n0 Qp ðstþ1 ; atþ1 Þjst ; at : (35) Calculate target Q-value z t by (38).
Then, the optimal policy that maximizes the expected long- Update the weights of Qðs; a; uÞ by computing the gradient
term reward is represented as of JðuÞ in (39), u ¼ u  0 ru JðuÞ:
Update the weights of Q ðs; a; u Þ every N  steps by
pðst Þ ¼ arg max Qðst ; aÞ: (36) u ¼ vd u þ ð1  vd Þu :
a end for
To obtain the optimal policy, a value iteration of the Q-value
function is employed, which can be given as follows:
6 PERFORMANCE EVALUATION
Qðst ;at Þ ¼ Qðst ;at Þþb0ðRtþ1 þn0maxaQðstþ1 ;aÞQðst ;at ÞÞ: (37)
6.1 Simulation Setup
Considering that the dimensions of state space and action We conduct our simulation on a desktop, which has two
space could be extremely large, it is difficult to obtain the NVIDIA TITAN Xp GPUs, a 128G RAM, and an Intel Xeon
optimal policy by looking up the table that stores the Q-value CPU. The simulation platform is based on Pytorch with
of the historical experiences. Therefore, in DDQN, the neural Python 3.7 on Ubuntu 16.04 LTS.
network is employed to approximate the Q-value function, In the simulation of V2V computation offloading, we
which is called Q-network and is denoted as Qðs; a; uÞ, where consider a one-way road, where multiple vehicles drive in
u represents the weights of Q-network. Then, the optimal the same direction, and a BS deployed by the road takes
policy becomes pðst Þ ¼ arg maxa Qðst ; a; uÞ. charge of vehicular task allocation. We set the number of
To mitigate the overestimation in the learning process, offloading tasks of a task vehicle in a period as 32, and simu-
two Q-networks are established and trained independently, late the performance of our proposed algorithm under dif-
i.e., the main Q-network and the target Q-network, which ferent traffic densities in the communication range of the
are represented as Qðs; a; uÞ and Q ðs; a; u Þ, respectively. BS. Besides, in the simulation of blockchain consensus, we
The main Q-network is responsible for choosing action consider an area with 50 BSs in total, and the traffic density
according to the observed system state, while the target Q- in the communication range of the BSs varies from 5
network is used to evaluate the policy obtained from the vehicles/km to 40 vehicles/km. We conduct multiple
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3893

TABLE 2
Simulation Parameters

Parameter Value
r 5 40 vehicles/km
Fs [3,7] GHz
Gtotal
b [20,29] GHz
D 500 m
Wts 10 MHz
Dn ½0:2; 4 Mbits
Cn ½0:2; 3:2 109 cycles
tn f0:5; 1; 2; 4g s
x 2 KB
S_ bk 8 MB [42] Fig. 4. The average utility of task vehicle for offloading a task in the pro-
a 2 Mcycles posed algorithm with different learning rates.
b 8 Mcycles [31]
c 0.5 Mcycles [31]
rates. It can be seen that the average utility reaches conver-
gence around 3,000 training episodes, and the algorithm with
learning rates Q ¼ 8 104 ; p ¼ 8 104 achieves the best
simulations with different numbers of consensus nodes Nc , performance compared with the algorithm with other learn-
and the range of Nc is set as [6, 30]. The other parameters in ing rates.
our simulation are summarized in Table 2. As shown in Fig. 5, the average utility of task vehicle for
offloading a task in our proposed algorithm is higher than
that in GTO and RTO, because the optimization objective of
6.2 Performance of V2V Computation Offloading our proposed algorithm is to maximize the expected long-
In the simulation of the vehicular computation offloading term reward, and the reward is related to the utility of task
algorithm, actor network pc ðajsÞ and critic networks vehicle, while in GTO, the agent always chooses the service
Qu1 ðs; aÞ and Qu2 ðs; aÞ are all fully-connected deep neural vehicle with the maximum idle computing resources in
networks (DNNs) with two hidden layers. We set the size of each step, which may not obtain the maximum average util-
the hidden layers in pc ðajsÞ as ð800; 500Þ, and set the learn- ity of offloading tasks. Moreover, GTO does not consider
ing rate of pc ðajsÞ as p ¼ 8 104 . Similarly, we set the the reliability of service vehicles in task offloading, some
size of the hidden layers in both Qu1 ðs; aÞ and Qu2 ðs; aÞ as tasks may be offloaded to service vehicles with low reliabil-
(800,500), and the learning rate of the two networks is set as ity and obtain fewer computing resources, which leads to
Q ¼ 8 104 . In addition, the batch size is set as jEj ¼ 256, the lower average utility of the task vehicle.
and the delay factor is set as vs ¼ 0:005. In Fig. 6, the completion ratio of offloading tasks by
The complexity of the vehicular computation offloading applying the three algorithms is presented. It can be seen
algorithm is commensurate with that of the DNN model. that the completion ratio of offloading tasks by applying
Given a VEC scenario where the number of service vehicles our proposed algorithm is higher than by applying GTO
is S, then the dimension of the state space is 4S þ 3 and the and RTO. Furthermore, with the traffic density increasing,
dimension of the action space is 2S according to (18) and there are more service vehicles available for task offloading,
(19). Then, the complexity of action generation in the algo- thus offloading tasks may obtain more computing resources
rithm can be given as Oðð4S þ 3Þh1 þ h1 h2 þ 2h2 SÞ, where from service vehicles, which increases both the average util-
ðh1; h2Þ is the size of the hidden layers in both actor network ity and the completion ratio of offloading tasks.
and critic network. In our simulations, the size of hidden Figs. 7a and 7b present the average delays of the offload-
layers in actor and critic is fixed, the complexity of action ing tasks that have been successfully completed under the
generation can be given as OðSÞ. Besides, in the training traffic density 10 vehicles/km and 30 vehicles/km, respec-
process of the algorithm, the complexity in each step is OðSÞ tively. We can observe that under low traffic density, the
as well. If there are No offloading tasks in a period, and the
training steps in a period are Nt , then the time complexity
of the algorithm in a period is OððNo þ Nt ÞSÞ.
To verify the performance of our proposed algorithm, we
introduce the following algorithms for comparison:

 Greedy task offloading (GTO): For each offloading


task, the BS always chooses the service vehicle with
the maximum idle computing resources, and gives
the service price that maximizes the utility of the
task.
 Random task offloading (RTO): For each offloading
task, the BS randomly chooses a service vehicle, and
randomly gives a service price in the price domain.
Fig. 4 shows the average utility of a task vehicle for offload- Fig. 5. The average utility of task vehicle for offloading a task under dif-
ing a task in our proposed algorithm with different learning ferent traffic densities.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3894 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023

Fig. 6. The completion ratio of offloading tasks under different traffic Fig. 8. The average utility per period and the selection probability of ser-
densities. vice vehicles (SVs) with different reliabilities when the traffic density is
30 vehicles/km.

average utility per period of service vehicles with different


reliabilities by applying our proposed algorithm. It can be
seen that vehicles with higher reliability are more likely to
be chosen as the service vehicles for offloading tasks. Mean-
while, service vehicles with higher reliability can obtain
higher utility in a period, this is because service vehicles
with higher reliability can receive more offloading tasks in
vehicular task offloading.

6.3 Performance of Consensus Process


In the simulation of the consensus nodes selection algo-
rithm, we utilize fully-connected DNNs to represent the Q-
networks. Both the main Q-network Qðs; a; uÞ and target Q-
network Q ðs; a; u Þ have two hidden layers with size
(1000,1000), and the learning rate of the algorithm is set as
0 ¼ 5 105 . Besides, we set the batch size jE 0 j ¼ 256, and
set the delay factor vd ¼ 0:01.
The complexity of the consensus nodes selection algo-
rithm is commensurate with that of the DNN model as well.
Similar to the complexity analysis of Algorithm 1, the com-
plexity of action generation and training in each step is
OðNB Þ, where NB is the number of BSs in the system.
To validate the performance of the DRL-based algorithm
of consensus nodes selection, we provide three benchmark
algorithms for comparison:

 traditional PBFT (PBFT): In each round of consensus,


the consensus nodes are preassigned and do not
Fig. 7. The average delay of offloading tasks with different maximum tol- vary with the rounds of consensus. The primary
erable delays. node in each round of consensus is selected in a
round-robin manner, and the block size is selected to
average delays of offloading tasks with t n 2 f0:5; 1; 2; 4g in maximize the average reward of the consensus
our algorithm are close to that in GTO, but the completion nodes.
ratio of GTO is lower than that of our proposed algorithm as  Greedy node selection (GNS): In each round of con-
illustrated in Fig. 6. Moreover, under high traffic density, it sensus, GNS always selects Nc BSs with the most
can be seen that the average delays of offloading tasks with available computing resources as the consensus
different deadlines by applying our proposed algorithm are nodes, and chooses the block size that maximizes the
all lower than the other two algorithms. This is because our average reward of the consensus nodes.
proposed algorithm considers the reliability of service  Random node selection (RNS): In each round of con-
vehicles and meanwhile maximizes the expected long-term sensus, RNS selects Nc BSs in a random manner, and
reward that is related to the offloading delay, while the other chooses the block size that maximizes the average
two algorithms do not consider these aspects in vehicular reward of the consensus nodes.
task allocation. Fig. 9 shows the average utility of consensus nodes in a
In addition, Fig. 8 presents the probability of being round of consensus in our proposed algorithm with different
selected as a service vehicle for an offloading task and the learning rates. We can notice that our proposed algorithm
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3895

Fig. 11. The average utility and throughput of blockchain consensus


under different generation rates of transactions.
Fig. 9. The average utility of consensus nodes in a round of consensus in
the proposed algorithm with different learning rates.

with learning rate 0 ¼ 5 105 or 0 ¼ 1 104 reaches con-


vergence around 2,000 training episodes. Besides, it can be
seen from Fig. 9 that when the learning rate becomes smaller,
the proposed algorithm needs to take more episodes to
achieve convergence. When the learning rate becomes large
enough, the proposed algorithm falls into a local optimum
and cannot achieve the best performance.
As shown in Fig. 10a, the average utility of consensus
nodes in our proposed algorithm is higher than that in the
benchmark algorithms, because our proposed algorithm
considers both the available computing resources and the
reliability of BSs, and jointly optimizes the consensus nodes
selection and block size, while benchmark algorithms do Fig. 12. The average utility and selection probability of BSs with different
not consider the reliability of the BSs in the selection of con- reliabilities in blockchain consensus.
sensus nodes. Additionally, we can notice that the perfor-
mance of RNS is close to that of PBFT, this is because both Figs. 11a and 11b show the average utility and the through-
RNS and PBFT do not utilize the state information of BSs in put of blockchain consensus under different generation
the consensus nodes selection. rates of transactions. From Fig. 11a, it can be seen that the
Fig. 10b presents the throughput of blockchain consensus average utility of consensus nodes in our proposed algo-
with different numbers of consensus nodes by applying our rithm is higher than that in other algorithms under different
proposed algorithm and benchmark algorithms. We can generation rates of transactions. Moreover, since the com-
notice that GNS achieves higher throughput than the other puting resources in BSs are limited, with the generation
algorithms, because GNS chooses the BSs with the maxi- rates of transactions increasing, the average utility of con-
mum available computing resources and does not consider sensus nodes gradually becomes larger and then tends to be
the reliability of BSs, which makes the block verification stable. From Fig. 11b, we can notice that with the generation
completed in the least time. Besides, we can notice that the rates of transactions increasing, the throughput of block-
throughput of consensus nodes decreases with the number chain consensus gradually increases and then tends to be
of consensus nodes increasing, this is because with the stable in our proposed algorithm, while the throughput of
increase of consensus nodes, the BSs with fewer available blockchain consensus in GNS remains unchanged, this is
computing resources will have more opportunities to partic- because GNS always chooses the BSs with the most comput-
ipate in the consensus, and the time of block verification in a ing resources as the consensus nodes, the throughput of
round of consensus depends on the minimum available GNS represents the upper bound of the throughput of
computing resources of the consensus nodes. consensus.
In addition, Fig. 12 shows that the BS with higher reliabil-
ity has a higher probability of being selected as a consensus
node in our proposed algorithm. Meanwhile, BS with
higher reliability can receive more rewards from blockchain
by participating in the consensus process. As presented in
(29), the utility of a consensus node not only depends on the
consensus delay and the number of validated transactions,
but also depends on the reliability of the BS. Since the objec-
tive of our proposed algorithm is to maximize the average
utility of consensus nodes, the agent tends to select BSs with
high reliability as the consensus nodes. Besides, through the
Fig. 10. The average utility and throughput of consensus nodes in a reward from blockchain, BSs are motivated to improve their
round of consensus with different number of consensus nodes. reliability in vehicular task allocation.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
3896 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 22, NO. 7, JULY 2023

7 CONCLUSION [15] H. Chai, S. Leng, K. Zhang, and S. Mao, “Proof-of-reputation


based-consortium blockchain for trust resource sharing in Internet
In this paper, we have investigated the integration of block- of Vehicles,” IEEE Access, vol. 7, pp. 175 744–175 757, 2019.
chain and VEC, and proposed a vehicular task allocation [16] Y. Dai, D. Xu, K. Zhang, S. Maharjan, and Y. Zhang, “Deep rein-
forcement learning and permissioned blockchain for content cach-
scheme to ensure secure and reliable computation offloading ing in vehicular edge computing and networks,” IEEE Trans. Veh.
among vehicles in the smart contract. To make the policy of Technol., vol. 69, no. 4, pp. 4312–4324, Apr. 2020.
computation offloading adapt to the dynamic vehicular envi- [17] X. Zhang and X. Chen, “Data security sharing and storage based
on a consortium blockchain in a vehicular ad-hoc network,” IEEE
ronment, we formulated the problem of vehicular task allo- Access, vol. 7, pp. 58 241–58 254, 2019.
cation as a sequential decision problem, and developed a [18] S. Wang, D. Ye, X. Huang, R. Yu, Y. Wang, and Y. Zhang,
DRL-based algorithm to solve the problem. Furthermore, to “Consortium blockchain for secure resource sharing in vehicular
improve the efficiency of blockchain consensus and motivate edge computing: A contract-based approach,” IEEE Trans. Netw.
Sci. Eng., vol. 8, no. 2, pp. 1189–1201, Second Quarter 2021.
BSs to improve the reliability in vehicular task allocation, we [19] J. Feng, Z. Liu, C. Wu, and Y. Ji, “AVE: Autonomous vehicular
proposed an enhanced consensus protocol based on PBFT, edge computing framework with ACO-based scheduling,” IEEE
and then designed a DDQN-based algorithm for consensus Trans. Veh. Technol., vol. 66, no. 12, pp. 10 660–10 675, Dec. 2017.
nodes selection. Finally, simulation results revealed that the [20] F. Sun et al., “Cooperative task scheduling for computation off-
loading in vehicular cloud,” IEEE Trans. Veh. Technol., vol. 67,
proposed algorithm can effectively improve the performance no. 11, pp. 11 049–11 061, Nov. 2018.
of V2V computation offloading and blockchain consensus, [21] L. T. Tan and R. Q. Hu, “Mobility-aware edge caching and com-
and meanwhile motivate BSs to improve their reliability in puting in vehicle networks: A deep reinforcement learning,” IEEE
Trans. Veh. Technol., vol. 67, no. 11, pp. 10 190–10 203, Nov. 2018.
vehicular resource allocation. [22] J. Zhao, M. Kong, Q. Li, and X. Sun, “Contract-based computing
resource management via deep reinforcement learning in vehicu-
REFERENCES lar fog computing,” IEEE Access, vol. 8, pp. 3319–3329, 2020.
[23] X. Chen, L. Zhang, Y. Pang, B. Lin, and Y. Fang, “Timeliness-
[1] X. Wang, Z. Ning, S. Guo, and L. Wang, “Imitation learning aware incentive mechanism for vehicular crowdsourcing in smart
enabled task scheduling for online vehicular edge computing,” cities,” IEEE Trans. Mobile Comput., early access, Jan. 19, 2021, doi:
IEEE Trans. Mobile Comput., vol. 21, no. 2, pp. 598–611, Feb. 2022. 10.1109/TMC.2021.3052963.
[2] J. Du, C. Jiang, J. Wang, Y. Ren, and M. Debbah, “Machine learn- [24] H. Liao, Y. Mu, Z. Zhou, M. Sun, Z. Wang, and C. Pan,
ing for 6G wireless networks: Carry-forward-enhanced band- “Blockchain and learning-based secure and intelligent task off-
width, massive access, and ultrareliable/low latency,” IEEE Veh. loading for vehicular fog computing,” IEEE Trans. Intell. Transp.
Technol. Mag., vol. 15, no. 4, pp. 123–134, Dec. 2020. Syst., vol. 22, no. 7, pp. 4051–4063, Jul. 2021.
[3] X. Peng, K. Ota, and M. Dong, “Multiattribute-based double auc- [25] X. Lin, J. Wu, S. Mumtaz, S. Garg, J. Li, and M. Guizani,
tion toward resource allocation in vehicular fog computing,” IEEE “Blockchain-based on-demand computing resource trading in
Internet Things J., vol. 7, no. 4, pp. 3094–3103, Apr. 2020. IoV-assisted smart city,” IEEE Trans. Emerg. Topics Comput., vol. 9,
[4] Z. Ning et al., “Partial computation offloading and adaptive task no. 3, pp. 1373–1385, Third Quarter 2021.
scheduling for 5G-enabled vehicular networks,” IEEE Trans. Mobile [26] K. Xiao, W. Shi, Z. Gao, C. Yao, and X. Qiu, “DAER: A resource
Comput., early access, Sep. 18, 2020, doi: 10.1109/TMC.2020.3025116. preallocation algorithm of edge computing server by using block-
[5] Z. Gao, M. Liwang, S. Hosseinalipour, H. Dai, and X. Wang, “A chain in intelligent driving,” IEEE Internet Things J., vol. 7,
truthful auction for graph job allocation in vehicular cloud- no. 10, pp. 9291–9302, Oct. 2020.
assisted networks,” IEEE Trans. Mobile Comput., early access, Feb. [27] S. Iqbal, A. W. Malik, A. U. Rahman, and R. M. Noor, “Blockchain-
16, 2021, doi: 10.1109/TMC.2021.3059803. based reputation management for task offloading in micro-level
[6] J. Du, C. Jiang, H. Zhang, Y. Ren, and M. Guizani, “Auction design vehicular fog network,” IEEE Access, vol. 8, pp. 52 968–52 980, 2020.
and analysis for SDN-based traffic offloading in hybrid satellite- [28] C. Qiu, H. Yao, F. R. Yu, C. Jiang, and S. Guo, “A service-oriented
terrestrial networks,” IEEE J. Sel. Areas Commun., vol. 36, no. 10, permissioned blockchain for the Internet of Things,” IEEE Trans.
pp. 2202–2217, Oct. 2018. Services Comput., vol. 13, no. 2, pp. 203–215, Mar./Apr. 2020.
[7] H. Xu, W. Huang, Y. Zhou, D. Yang, M. Li, and Z. Han, “Edge [29] J. Luo, Q. Chen, F. R. Yu, and L. Tang, “Blockchain-enabled soft-
computing resource allocation for unmanned aerial vehicle ware-defined industrial Internet of Things with deep reinforcement
assisted mobile network with blockchain applications,” IEEE learning,” IEEE Internet Things J., vol. 7, no. 6, pp. 5466–5480,
Trans. Wireless Commun., vol. 20, no. 5, pp. 3107–3121, May 2021. Jun. 2020.
[8] Z. Xiong, Y. Zhang, D. Niyato, P. Wang, and Z. Han, “When [30] M. Castro and B. Liskov, “Practical byzantine fault tolerance,” in
mobile blockchain meets edge computing,” IEEE Commun. Mag., Proc. 13rd Symp. Oper. Syst. Des. Implementation, 1999, pp. 173–186.
vol. 56, no. 8, pp. 33–39, Aug. 2018. [31] C. Qiu, F. R. Yu, H. Yao, C. Jiang, F. Xu, and C. Zhao, “Blockchain-
[9] Z. Xiong, S. Feng, W. Wang, D. Niyato, P. Wang, and Z. Han, based software-defined industrial Internet of Things: A dueling
“Cloud/fog computing resource management and pricing for deep Q -learning approach,” IEEE Internet Things J., vol. 6, no. 3,
blockchain networks,” IEEE Internet Things J., vol. 6, no. 3, pp. 4627–4639, Jun. 2019.
pp. 4585–4600, Jun. 2019. [32] A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti,
[10] M. B. Mollah et al., “Blockchain for the internet of vehicles towards “Making byzantine fault tolerant systems tolerate byzantine
intelligent transportation systems: A survey,” IEEE Internet Things faults,” in Proc. 6th USENIX Symp. Netw. Syst. Des. Implementation,
J., vol. 8, no. 6, pp. 4157–4185, Mar. 2021. 2009, pp. 153–168.
[11] J. Feng, F. R. Yu, Q. Pei, J. Du, and L. Zhu, “Joint optimization of [33] M. Liu, F. R. Yu, Y. Teng, V. C. M. Leung, and M. Song, “Performance
radio and computational resources allocation in blockchain- optimization for blockchain-enabled industrial internet of things
enabled mobile edge computing systems,” IEEE Trans. Wireless (IIoT) systems: A deep reinforcement learning approach,” IEEE
Commun., vol. 19, no. 6, pp. 4321–4334, Jun. 2020. Trans. Ind. Informat., vol. 15, no. 6, pp. 3559–3570, Jun. 2019.
[12] F. Guo, F. R. Yu, H. Zhang, H. Ji, M. Liu, and V. C. M. Leung, [34] W. L. Tan, W. C. Lau, O. Yue, and T. H. Hui, “Analytical models
“Adaptive resource allocation in future wireless networks with and performance evaluation of drive-thru internet systems,” IEEE
blockchain and mobile edge computing,” IEEE Trans. Wireless J. Sel. Areas Commun., vol. 29, no. 1, pp. 207–222, Jan. 2011.
Commun., vol. 19, no. 3, pp. 1689–1703, Mar. 2020. [35] J. Du, E. Gelenbe, C. Jiang, H. Zhang, and Y. Ren, “Contract
[13] R. Yang, F. R. Yu, P. Si, Z. Yang, and Y. Zhang, “Integrated block- design for traffic offloading and resource allocation in heteroge-
chain and edge computing systems: A survey, some research neous ultra-dense networks,” IEEE J. Sel. Areas Commun., vol. 35,
issues and challenges,” IEEE Commun. Surveys Tuts., vol. 21, no. 2, no. 11, pp. 2457–2467, Nov. 2017.
pp. 1508–1532, Second Quarter 2019. [36] J. Zhao, Q. Li, Y. Gong, and K. Zhang, “Computation offloading
[14] Z. Zhuang, J. Wang, Q. Qi, J. Liao, and Z. Han, “Adaptive and and resource allocation for cloud assisted mobile edge computing
robust routing with Lyapunov-based deep RL in MEC networks in vehicular networks,” IEEE Trans. Veh. Technol., vol. 68, no. 8,
enabled by blockchains,” IEEE Internet Things J., vol. 8, no. 4, pp. 7944–7956, Aug. 2019.
pp. 2208–2225, Feb. 2021.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.
SHI ET AL.: DRL-BASED V2V COMPUTATION OFFLOADING FOR BLOCKCHAIN-ENABLED VEHICULAR NETWORKS 3897

[37] X. Huang, R. Yu, J. Kang, and Y. Zhang, “Distributed reputation Jian Wang (Senior Member, IEEE) received the
management for secure and efficient vehicular edge computing PhD degree in electronic engineering from Tsinghua
and networks,” IEEE Access, vol. 5, pp. 25 408–25 420, 2017. University, Beijing, China, in 2006. He is currently a
[38] M. Liu, F. R. Yu, Y. Teng, V. C. M. Leung, and M. Song, professor with the Department of Electronic Engi-
“Distributed resource allocation in blockchain-based video neering, Tsinghua University. His research interests
streaming systems with mobile edge computing,” IEEE Trans. include application of statistical theories, optimiza-
Wireless Commun., vol. 18, no. 1, pp. 695–708, Jan. 2019. tion, machine learning to communication, network-
[39] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: ing, navigation, and resource allocation problems.
Off-policy maximum entropy deep reinforcement learning with
a stochastic actor,” in Proc. 35th Int. Conf. Mach. Learn., 2018,
pp. 1861–1870.
[40] T. Haarnoja et al., “Soft actor-critic algorithms and applications,”
2018, arXiv: 1812.05905.
[41] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement Jian Yuan (Member, IEEE) received the PhD
learning with double Q-learning,” in Proc. 13th AAAI Conf. Artif. degree in electrical engineering from the Univer-
Intell., 2016, pp. 2094–2100. sity of Electronic Science and Technology of
[42] J. Feng, F. R. Yu, Q. Pei, X. Chu, J. Du, and L. Zhu, “Cooperative China, in 1998. He is currently a professor with the
computation offloading and resource allocation for blockchain- Department of Electronic Engineering at Tsinghua
enabled mobile-edge computing: A deep reinforcement learning University, Beijing, China. His research focuses
approach,” IEEE Internet Things J., vol. 7, no. 7, pp. 6214–6228, Jul. on complex dynamics of networked systems.
2020.

Jinming Shi (Student Member, IEEE) received


the BE degree in electronic engineering in 2015
from Tsinghua University, Beijing, China, where Zhu Han (Fellow, IEEE) received the BS degree
he is currently working toward the PhD degree in
in electronic engineering from Tsinghua Univer-
electronic engineering. His research interests
sity, in 1997, and the MS and PhD degrees in
include Internet of Vehicles, blockchain, vehicular electrical and computer engineering from the Uni-
fog computing, and deep reinforcement learning. versity of Maryland, College Park, in 1999 and
2003, respectively. He is currently a John and
Rebecca Moores professor with the Electrical
and Computer Engineering Department and with
the Computer Science Department, University of
Jun Du (Senior Member, IEEE) received the BS Houston, Texas. His research interests include
degree in information and communication engi- wireless resource allocation and management,
neering from the Beijing Institute of Technology in wireless communications and networking, game theory, Big Data analy-
2009, and the MS and PhD degrees in information sis, security, and smart grid. He was the recipient of the NSF Career
and communication engineering from Tsinghua Award in 2010, Fred W. Ellersick Prize of the IEEE Communication Soci-
University, Beijing, in 2014 and 2018, respectively. ety in 2011, EURASIP Best Paper Award for the Journal on Advances in
From October 2016 to September 2017, she was a Signal Processing in 2015, IEEE Leonard G. Abraham Prize in the field
sponsored researcher, and she visited Imperial of Communications Systems (Best Paper Award in IEEE JSAC) in 2016,
College London. She is currently an assistant pro- and several best paper awards in IEEE conferences. He was an IEEE
fessor with the Department of Electrical Engineer- Communications Society distinguished lecturer from 2015 to 2018,
ing, Tsinghua University. Her research interests AAAS fellow since 2019, and ACM distinguished Member since 2019.
include communications, networking, resource allocation and system He is a 1% highly cited researcher since 2017 according to Web of Sci-
security problems of heterogeneous networks and space-based informa- ence. He was also the winner of the 2021 IEEE Kiyo Tomiyasu Award,
tion networks. for outstanding early to mid-career contributions to technologies holding
the promise of innovative applications.

Yuan Shen (Senior Member, IEEE) received the BE


" For more information on this or any other computing topic,
degree in electronic engineering from Tsinghua Uni-
versity in 2005, and the SM and PhD degrees in elec- please visit our Digital Library at www.computer.org/csdl.
trical engineering and computer science from the
Massachusetts Institute of Technology in 2008 and
2014, respectively. He is currently a professor and
the vice chair with the Department of Electronic Engi-
neering, Tsinghua University. His current research
interests include network localization and navigation,
integrated sensing and control, resource allocation,
and cooperative systems. His papers was the recipi-
ent of the IEEE ComSoc Fred W. Ellersick Prize and three best paper awards
from IEEE conferences.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on July 27,2023 at 11:32:05 UTC from IEEE Xplore. Restrictions apply.

You might also like