0% found this document useful (0 votes)
32 views12 pages

Multiagent Deep Reinforcement Learning Based Incentive Mechanism For Mobile Crowdsensing in Intelligent Transportation Systems

latest paper on stackelberg

Uploaded by

Himanshi Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views12 pages

Multiagent Deep Reinforcement Learning Based Incentive Mechanism For Mobile Crowdsensing in Intelligent Transportation Systems

latest paper on stackelberg

Uploaded by

Himanshi Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE SYSTEMS JOURNAL, VOL. 18, NO.

1, MARCH 2024 527

Multiagent Deep Reinforcement Learning Based


Incentive Mechanism for Mobile Crowdsensing
in Intelligent Transportation Systems
Mengge Li , Student Member, IEEE, Miao Ma , Liang Wang , Member, IEEE, Zhao Pei , Member, IEEE,
Jie Ren , Member, IEEE, and Bo Yang

Abstract—Mobile crowdsensing in intelligent transportation sys-


tems is a new data acquisition mode that utilizes vehicles to sense a
changing traffic environment, where incentive mechanisms play
a vital role in motivating vehicles to participate. Nevertheless,
existing incentive mechanisms do not simultaneously consider en-
hancing the efficiency of task allocation, maximizing the benefits
of the entire system, and balancing various costs. Therefore, a
two-stage incentive mechanism, considering task aggregation, is
designed to maximize the benefits of vehicles and the platform.
In the first stage, each vehicle, utilizing the multiagent deep re-
inforcement learning algorithm, dynamically adjusts its path se-
lection strategy to maximize its net income and then uploads the
selected path information to the platform. In the second stage, this
platform solves an NP-hard problem of maximizing satisfaction
by combining the bid price, time consumption, and task coverage
to assign tasks to the appropriate vehicles. In particular, vehicle
satisfaction is measured by the analytic hierarchy process, and
rewards are allocated accordingly. The proposed mechanism is
compared with the state-of-the-art method, the greedy, random,
and optimal algorithms. Experiments on the real trajectory dataset
manifest that the proposed mechanism can fully use the platform Fig. 1. Scenario of vehicular crowdsensing.
budget to achieve the best net income, platform satisfaction, and
task completion rate.
congestion. Traffic congestion increases travel time and fuel
Index Terms—Benefits of vehicles and the platform, incentive costs, and intensifies traffic accidents. Aiming to alleviate these
mechanism, intelligent transportation systems (ITS), mobile problems, intelligent transportation systems (ITS) [1] enable
crowdsensing (MCS), multiagent deep reinforcement learning traditional transportation to be more informative, intelligent,
(MADRL).
and socialized. With the advent of ITS, intelligent vehicles,
deployed with plentiful sensors, can collaboratively sense the
I. INTRODUCTION rapidly changing environment, collect valuable information to
better support people’s safe travel, and improve people’s driving
ECENTLY, with the rapid improvement of people’s
R economic level, the number of vehicles worldwide has
witnessed explosive growth, which leads to serious traffic
experience, which enables mobile crowdsensing (MCS) [2],
[3], [4] in ITS to become a new paradigm. Different from
the conventional static sensor network, MCS in ITS uses
vehicles to perform cheap, real-time, and large-scale sensing
Manuscript received 14 April 2023; revised 22 November 2023; accepted operations. A common MCS system is made up of a platform,
3 January 2024. Date of publication 25 January 2024; date of current version a crowd of task initiators (TIs), and participants. Fig. 1 depicts
15 March 2024. This work was supported in part by the National Natural Sci-
ence Foundation of China under Grant U2001205, Grant 62377031, and Grant
a vehicular crowdsensing scenario. TIs, usually individuals
62071283, in part by the Key Research and Development Program in Shaanxi or organizations, issue tasks [5], [6], [7] to the platform.
Province under Grant 2023-YBGY241, and in part by the Excellent Graduate Afterward, the platform assigns published tasks to suitable
Training Program of Shaanxi Normal University under Grant LHRCTS23096. vehicle participants. The selected vehicles replan their trips to
(Corresponding authors: Miao Ma; Bo Yang.) facilitate the task execution without affecting the scheduling
Mengge Li, Liang Wang, Zhao Pei, Jie Ren, and Bo Yang are with the
School of Computer Science, Shaanxi Normal University, Xi’an 710119, China and contribute the collected sensing data to the platform. At
(e-mail: [email protected]; [email protected]; [email protected]; last, the platform summarizes the received results and feeds
[email protected]; [email protected]). them back to the TIs. In addition, the platform also controls the
Miao Ma is with the Key Laboratory of Modern Teaching Technology, payoff based on the submission results of participants.
Ministry of Education, Xi’an 710062, China, and also with the School of
Computer Science, Shaanxi Normal University, Xi’an 710119, China (e-mail:
The extensive participation of vehicles is the key to the success
[email protected]). of MCS in ITS. In order to complete the specified tasks, vehicles
Digital Object Identifier 10.1109/JSYST.2024.3351310 need to spend their time and consume the devices’ energy, so the

1937-9234 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
528 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024

motivation of the vehicles to participate in the tasks is very low 2) By considering the bid price, time consumption, and task
unless satisfactory compensation can be obtained. Therefore, the coverage of the paths submitted by each vehicle, we refine
design of incentive mechanisms plays a vital role in stimulating the satisfaction that each vehicle can bring to the platform
vehicles to take part in sensing activities and submit high-quality using the analytic hierarchy process (AHP). Based on this,
results [8], [9], [10]. we propose the maximum-platform-satisfaction winning-
Most previous works about task assignment and incentive bid selection (MSBS) problem proved to be NP-hard in
mechanisms focus on individual tasks, where each task is done MCS.
independently. In reality, multiple tasks are often coupled with 3) A high-efficiency task allocation algorithm, namely, the
each other in common resource collections. To realize the full optimized allocation strategy of budget-constrained tasks
utilization of resources, research works [11], [12] on multi- (SBCT), is proposed, which is expected to make the utmost
task allocation have attracted much attention recently, but task of the budget and maximize platform satisfaction. The
characteristics are still ignored. In practice, clustering features vehicles that perform the tasks are rewarded based on
in terms of geographic location distribution (such as gathering satisfaction to compensate for their cost.
information on parking lots near department stores) are common 4) We perform extensive simulations on the actual vehicle
in scenarios with a number of tasks. We call such tasks clustering trajectory dataset. Simulation experiments demonstrate
tasks. When these tasks are handled uniformly, the effectiveness the proposed MADDPG and SBCT incentive mechanism
of task allocation will be greatly optimized. can achieve approximately optimal performance in terms
Furthermore, most available multitask allocation strategies of the total net income, the platform’s satisfaction, the task
either aim at the interests of the platform, such as maximizing completion rate, and the average remaining budget.
the utility or welfare of the platform [11], [13], [14], [15], [16], The rest of this article is organized as follows. Related work
[17], or optimize the interests of the participants, for instance, is summarized in Section II. Section III introduces the system
improving the schedule of performed tasks or the benefit of the model and formulates the vehicle path selection problem. We
participants [8], [9], [18], [19], [20], [21]. However, only when describe the proposed MADDPG-based path selection strategy
the platform’s and participants’ interests are simultaneously in Section IV. The MSBS problem and the proposed optimized
considered, the overall interests of the entire system could be task allocation strategy are given in Section V. Extensive exper-
improved to the fullest extent. Meanwhile, path planning is iments in Section VI evaluate the proposed strategies. Finally,
crucial for maximizing the benefits of the vehicle participants. Section VII concludes this article.
Nevertheless, the available MCS schemes in ITS ignore the study
on vehicle path planning.
Individual optimization goals, such as improving the quality II. RELATED WORK
of sensing results [22], reducing expenditure [23], [24], or
TIs, the platform, and participants are the three key roles
delaying, only apply to the requirements of some specific appli-
in MCS. Many available task allocation schemes and incentive
cations. For example, only considering improving the quality of
mechanisms have been proposed based on the interests of TIs,
sensing results may cause large waiting delays and recruitment
the platform, and participants, respectively.
costs, which is intolerable for most other applications on the
platform. Therefore, during task assignment, it is meaningful to
simultaneously consider the bid price, time consumption, and
task coverage to optimize the monetary expenditure, the time A. Incentive Mechanism Based on the TIs
consumption for completing the tasks, and the communication From the perspective of the TIs, Wang et al. [12] designed
cost of the system. a two-stage task distribution approach to improve the quality
To overcome these above-mentioned research gaps, we design of sensing results and economize the budget. Hui et al. [24]
a two-stage incentive mechanism based on task composition constructed a centerless collaborative MCS framework using
performed on geographic location to optimize the interests blockchain and a coalition formation algorithm to minimize
of vehicles and the platform simultaneously. First, the article the cost. Tan et al. [26] developed a three-stage framework
maximizes the net income of vehicles by selecting appropriate that exploits realistic relationships in social networks to form
paths for each vehicle based on multiagent deep reinforcement groups to improve task coverage and cooperation quality. Tang
learning (MADRL) proved to be very effective in vehicle path et al. [27] developed an algorithm using reinforcement learning
planning [25]. Then, the satisfaction of the platform as a com- (RL) to improve spatial and temporal coverage. Liu et al. [28]
bination of the bid price, time consumption, and the number of developed a cooperative information sensing framework uti-
covering tasks is maximized by assigning the clustering tasks to lizing edge intelligence to improve spatial-temporal evenness.
appropriate bidding vehicles. Based on the optimization of the Cao et al. [29] developed an MCS-oriented motivation strategy
above two points, the global interests of the whole MCS system to encourage nearby participants to share their resources and
can be maximized. In detail, the main contributions of the article a task migration method to optimize the utility. Li et al. [30]
are as follows. proposed concurrent tasks and safety emergency task allocation
1) We develop the framework utilizing multi-agent deep methods based on RL to optimize the utility of concurrent tasks
deterministic policy gradient (MADDPG), a typical based on meeting the requirements of safety emergency tasks.
MADRL technology that is suitable to the rapidly chang- Wang et al. [31] designed a participant recruitment scheme
ing surrounding traffic and the uncertainty of parameters, with multiple stages using the combinatorial multiarmed bandit
to make each vehicle learn the optimal path selection algorithm to optimize the task execution effect. Wang et al. [32]
policy to maximize its net income through constant ob- proposed a distributed MADRL framework for multiunmanned
servation and adjustment. aerial vehicle trajectory planning to jointly minimize the

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 529

age-of-information threshold and maximize the collection TABLE I


ratio. COMPARISON OF CROWDSENSING SCHEMES

B. Incentive Mechanism Based on the Platform


From the perspective of the platform, Li and Zhang [11]
researched the multitask-oriented distribution problem and de-
veloped two evolutionary methods to optimize the platform’s
utility. Liu et al. [15] developed a model utilizing dual auc-
tion to optimize the TIs and the platform’s global utility. Pei
et al. [17] developed an online algorithm utilizing Lyapunov to
stabilize dynamic MCS systems and optimize the platform’s
utility. Dong et al. [33] designed an optimal MCS incentive
considering sensing inaccuracy to improve the platform’s utility.
Jiang et al. [34] proposed an incentive mechanism based on Q
learning and a dynamic pricing strategy to effectively utilize the
platform budget and maximize platform utility. Ji et al. [35]
studied two scenarios of online social awareness and social
unawareness, and proposed an incentive mechanism to select
an optimal group of users that can not only meet the skill
requirements of the tasks but also maximize the platform’s
utility. Huang et al. [13] constructed a high-efficiency task
distribution algorithm that could exploit the sensing capabili-
ties of each participant and improve the platform’s profit. Liu
et al. [16] constructed the double deep Q network combining
the dueling model to obtain a good payment scheme for the
platform and optimize the platform’s revenue. Xu and Song [36]
proposed an MADRL-based approach, called communication-
QMIX-based multi-agent DRL (CQDRL), to solve the task allo-
cation problem in a decentralized manner and help the platform
maximize total profit. Wang et al. [14] developed a victorious
team selection strategy utilizing reverse auction and adopted
a knapsack-based approach to improve the welfare of the
system. Therefore, to bridge this gap, this article optimizes the inter-
ests of vehicle participants by designing an MADDPG-based
path selection algorithm and formulates a satisfaction index by
C. Incentive Mechanism Based on the Participants combining the bid price, time consumption, and task coverage
From the perspective of the participants, Gu et al. [8] de- of vehicle participants to meet different applications’ require-
scribed the rival interaction between participants and the MCS ments. Meanwhile, SBCT is proposed to maximize satisfaction
platform as the multistage Stackelberg game, where each par- obtained by the platform.
ticipant observes transaction records and constantly adjusts its
pricing. Cheung et al. [19] proposed dynamic programming for III. SYSTEM OVERVIEW AND PROBLEM FORMULATION
delay-sensitive sensing tasks to maximize participants’ profits. In the section, an overview of MCS in ITS is first presented.
Li et al. [20] developed an ant colony-based scheduling al- Then, we formulate the vehicle path selection problem.
gorithm for multitasks to optimize participants’ benefits. Tao
and Song [21] designed a strategy utilizing a genetic algorithm
to improve data quality and a detective algorithm to promote A. System Overview
participants’ profits. Zhao and Liu [9] designed an incentive We focus on an MCS platform in ITS that utilizes vehicles
strategy considering social awareness based on RL to optimize to sense plentiful information. The operational process consists
the participants’ utility. Huang et al. [37] designed a two-stage of multiple sensing rounds, in which each sensing round is
game model to help requesters appropriately price the sensing divided into area division, task information broadcasting, vehicle
tasks and to help users determine their participation levels for bidding, task assignment, and task execution in Fig. 2. In the
the tasks to maximize the utility of requesters and mobile users. first stage, the platform decomposes each task published by
Deng et al. [18] studied the spatial crowdsourcing problem, the TIs into multiple subtasks, and divides the large-scale area
whose target is to seek a strategy to optimize the task numbers into multiple sensing subareas. Each subarea is called an area
executed by participants. of interest (AoI). The subtasks located in the same AoI are
In summary, there are a lot of good studies on task assignment aggregated into a new clustering task. In the task information
and incentive mechanisms, as given in Table I. However, these broadcasting stage, the MCS platform broadcasts the clustering
works do not simultaneously consider the interests of the plat- tasks to the vehicles. In the vehicles bidding stage, vehicles
form and participants, and do not fully take the heterogeneity interested in the received tasks plan their paths according to their
of participants from the perspective of the bid price, time con- current positions, destinations, and deadlines, and upload the
sumption, and task coverage of its submitted path into account. information to the MCS platform. After that, the MCS platform

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
530 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024

TABLE II
SUMMARY OF NOTATION

Fig. 2. Five stages in each sensing round.

assigns unexecuted tasks to appropriate vehicles. Finally, in


the task execution stage, each vehicle completes the assigned
tasks and contributes valuable information. The unfinished tasks
can be left to the next sensing round. The above-mentioned
bids on the task set Skv by uploading a quoted price bvk . Vehicle v
processes will be carried out in turn before all tasks are
also has an associated true cost cvk for path Pkv , which is known
completed.
privately by itself.
In the vehicle bidding stage, each vehicle plans its path
To effectively perform the tasks in Skv , vehicle v takes a certain
according to its preference and deadline, bids on the tasks on the
amount of working time wtvk . Depending on the rationality
selected path, and uploads the path information to the platform.
of vehicle v, it will select path Pkv and perform Skv only if
Hence, the following two assumptions are made.
wtvk + trkv ≤ Dv , where trkv is the traveling time of vehicle v
Assumption 1: The time required to complete the tasks
on path Pkv . The working time wtvk is speculated by vehicle
is far less than the travel expenditure, a car executes more
v according to the received tasks’ load and its own capacity,
tasks, and more additional rewards can be obtained. There-
and the traveling time trkv is calculated according to the path
fore, a vehicle generally wants to accomplish all tasks on a
Pkv and speed planned by vehicle v. When the vehicle reaches
path.
its destination, the platform will check whether the vehicle
Assumption 2: If the total time spent on the path exceeds its
truthfully reported trkv as a basis for the integrity of vehicle
deadline to the destination, each vehicle will refuse to select the
v. Generally, the bid information uploaded by vehicle v can be
path because of rationality.
represented as a tuple βv = {Pkv , Skv , bvk , wtvk , trkv }.
The vehicle path planning in the vehicle bidding stage and
The payoff paid by the platform that vehicle v can get by
the task assignment stage are the focus of this article. Below, we
selecting the path Pkv and executing the task set Skv is denoted as
will detailedly introduce them in a sensing round. The symbols
pavk . Obviously, only if vehicle v is selected by the platform to
commonly used in this article and their descriptions are given in
execute the tasks and uploads the tasks’ results, it will gain the
Table II.
payoff pavk . Then, the net income ψkv that vehicle v can obtain
by executing the tasks set Skv is given by
B. Problem Formulation
ψkv = θv (pavk − cvk ) (1)
In each sensing round, we consider that the total number of
vehicles and the total number of sensing tasks are represented by where θv indicates whether the platform selects vehicle v. If the
V and S, respectively. Then, V = {1, 2, . . ., v, . . ., V } and S = platform selects vehicle v, then θv = 1, otherwise θv = 0.
{1, 2, . . ., s, . . ., S} are used to represent the set of vehicles and The goal for each vehicle is to maximize its net income
the set of sensing tasks, respectively. Each vehicle v, ∀v ∈ V, by selecting an appropriate path. Therefore, the path selection
travels from its current location to its destination, performs tasks problem for vehicle v, v ∈ V, can be formulated as follows:
when it passes each AoI during the journey, obtains the payoff, 
lv
and must reach the destination before the deadline. The deadline P 1 : max ζkv θv (pavk − cvk )
ζ
of vehicle v to reach the destination is denoted as Dv . Vehicle k=1
v has lv possible paths to the destination, which are denoted as 
lv
s.t. C1 : ζkv (wtvk + trkv ) ≤ Dv (2)
{P1v , P2v , . . ., Pkv , . . ., Plvv }, and each path covers a set of tasks. k=1
The tasks passed by different paths of vehicle v are expressed 
lv
as {S1v , S2v , . . ., Skv , . . ., Slvv }, where Skv is the task set passed by C2 : ζkv ≤ 1
k=1
the kth path of vehicle v, 1 ≤ k ≤ lv . The tasks in Skv , covered
by path Pkv , are executed as a whole, which means either all where indicates whether vehicle v selects path Pkv , if vehicle
ζkv
tasks are executed, or none of them are executed. Based on the v selects Pkv , then ζkv = 1, otherwise, ζkv = 0. Constraint C1 is to
received tasks’ information, vehicle v selects its path Pkv and ensure that the time spent by vehicle v to travel to the destination

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 531

B. Action Space
At sensing round t, each vehicle v observes the decisions of
the platform in the first L sensing rounds and then gives its path
selection action at the current sensing round on the basis of the
output of the actor network
 
atv = μv otv |ϕμv . (4)
Therefore, the path selected by vehicle v at sensing round t is
defined as
 
ξvt = atv lv (5)
where x is used to obtain the result of rounding up x.
Fig. 3. MADDPG-based vehicle path selection architecture.
C. Reward Function
along the selected path does not exceed its deadline for reaching The reward that vehicle v obtains at sensing round t is denoted
the destination, and constraint C2 is to ensure that vehicle v as
selects at most one path. lv
In recent years, the advantages of MADDPG in path de- rvt = ζkv (t)θvt (pavk (t) − cvk (t)) (6)
sign have been confirmed [25]. Therefore, we use MAD- k=1
DPG to make each vehicle learn the optimal path selection
strategy. where ζkv (t) indicates whether vehicle v selects path Pkv at
sensing round t, pavk (t) is the payoff paid by the platform that
vehicle v can get by selecting path Pkv and executing task set Skv
IV. PROPOSED MADDPG-BASED PATH SELECTION STRATEGY at sensing round t, cvk (t) is the true cost of vehicle v for path Pkv
MADDPG [8], [38] is a typical pattern of machine learning, and tasks set Skv at sensing round t. θvt and pavk (t) are determined
which makes agents learn policies to maximize benefits. At by the platform’s task allocation strategy.
sensing round t, each vehicle, acting as an agent, outputs an
action atv after observing the environment otv . Later, it will get a D. MADDPG Networks Update
reward rvt , then continue to observe the changing environment
MADDPG includes a critic network Qv (o, a|ϕQv ) and an
and adjust its strategy, as shown in Fig. 3. In a system with
actor network μv (ov |ϕμv ) for each vehicle v, each actor net-
multiple vehicles, every vehicle’s policy will constantly change
work μv (ov |ϕμv ) has a corresponding target actor network
during training. For a vehicle, other vehicles are all part of the 

environment, so the whole environment will become unstable, μv (ov |ϕμv ), and each critic network Qv (o, a|ϕQv ) has a target

which will lead to difficult convergence. To this end, we employ critic network Qv (o, a |ϕQv ). The critic network [8], [38] of
two critical techniques in MADDPG [8]. vehicle v is updated with
1) Centralized training and distributed execution [8]: Cen- G
1  g  2
tralized learning is utilized to train the critic and actor Lv = y v − Q v o g , a g | ϕQ v (7)
networks. During execution, each actor can run simply by G g=1
knowing local information.
2) Augmented critic [8]: The critic network of each vehicle where og = {og1 , . . ., ogV }, ag = {ag1 , . . ., agV }, G denotes the
is reinforced by speculating the policies of others. mini-batch size, and yvg is the objective value calculated by
For the above problem P 1, the state otv , action atv , and reward 
rvt of each vehicle are denoted as follows. yvg = rvg + γQv o(g+1) , a(g+1) ϕQv (8)

where γ ∈ [0, 1] is the discount factor.


A. Observation Space The vehicle v’s policy gradient [8] can be updated with
The state is a surrounding environmental feature vector corre- G
sponding to P 1. We consider the observation results of the first 1
∇ϕµv J ≈ ∇ϕµv av ∇av Qv og , agv , agV \v ϕQv (9)
L sensing rounds so that each vehicle gets a better understanding G g=1
of environmental changes. At each sensing round t, the vehicle
v’s observation result is where agV \v = {ag1 , . . ., agv−1 , agv+1 , . . ., agV }.
  The target networks of vehicle v are softly updated [8] by
otv = M (P v ) , Dv , ξvt−1 , ψvt−1 , . . ., ξvt−L , ψvt−L ∀v ∈ V  
(3) ϕQv ← τ ϕQv + (1 − τ ) ϕQv
where M(P v ) represents the details of all alternative  

paths to the destination for vehicle v, and M(P v ) = ϕμv ← τ ϕμv + (1 − τ ) ϕμv (10)
{{P1v , S1v , cv1 , bv1 , wtv1 , tr1v }, . . ., {Pkv , Skv , cvk , bvk , wtvk , trkv }, . . ., where τ is a soft update factor.
{Plvv , Slvv , cvlv , bvlv , wtvlv , trlvv }}, ξvt−1 represents the path index For each sensing round, after the selecting paths of the ve-
selected by vehicle v at sensing round t − 1, i.e., ξvt−1 = k (in hicles are determined, based on the information uploaded by
case of ζkv = 1 at the sensing round t − 1), and ψvt−1 indicates the vehicles, the MCS platform performs the task allocation to
the net income of vehicle v at the sensing round t − 1. determine θv and pavk .

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
532 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024

V. PROPOSED OPTIMIZED TASK ASSIGNMENT STRATEGY


Given all vehicle bids, the platform strives to maximize its
satisfaction by performing the task allocation. First, we quantify
the satisfaction each vehicle can provide to the platform and then
determine the task assignment result.

Fig. 4. Pairwise comparison matrix and its normalized version.


A. Weights Determination and Satisfaction Calculation
Our optimization objectives for selecting the appropriate ve-
taking the average of each row element of U , namely
hicles from the candidate vehicles involve minimizing the bid
prices and time consumption of the chosen vehicles to satisfy the 3
1
demands of TIs and maximizing the number of covering tasks wi = ūij . (12)
on the paths submitted by vehicles to select as few vehicles as 3 j=1
possible, thereby economizing the communication cost of the
system. Therefore, we assume that the satisfaction that each After obtaining W , the satisfaction level that vehicle v pro-
vehicle can contribute to the platform is a combination of the vides to the platform can be calculated according to (11). We then
bid price, the time consumption, and the task coverage of its give the specific rule for calculating actual satisfaction based on
uploading path. Generally, the higher the bid price of a vehicle’s satisfaction level as follows:
uploading path, the lower the satisfaction it can provide to SATv = SA0 + εSAv (13)
the platform. The longer the time consumption, the lower the
satisfaction it can provide. The more tasks covered by the paths where SA0 is the basic satisfaction, and ε is a coefficient deter-
selected by the vehicles, the higher the satisfaction it can provide. mined by the MCS platform that can link the satisfaction level
In consequence, a linear function is appropriately adopted to of the vehicle with the actual satisfaction.
determinately formulate the influence of the three criteria on the
satisfaction level SAv that the vehicle v provides to the platform B. Task Assignment That Maximizes Satisfaction
as follows: The MCS platform’s objective is to optimize satisfaction by
Hv Bv Ov assigning tasks to appropriate vehicles. Therefore, the MSBS
SAv = −w1 − w2 + w3 (11) problem is formalized as follows:
Hmax Bmax Omax

P 2 : max θv SATv
where Hv is the time consumption, Bv is the bid price, and Ov is θ v∈V

the number of covering tasks of the path submitted by vehicle v to s.t. C1 : θv Bv ≤ B (14)
the platform. Namely, Hv = wtvk + trkv , Bv = bvk , Ov = |Skv |, 
v∈V
s
and k = ξvt . Hmax is the maximum time consumption of the C2 : θv λv ≥ 1 ∀s ∈ S
v∈V
paths submitted by all vehicles. Bmax is the maximum bid price
of the paths submitted by all vehicles, and Omax is the covering where B is the budget of the platform, λsv denotes whether task
tasks’ maximum number of paths uploaded by all vehicles. w1 , s is contained in Sv , and Sv is the task set covered by the path
w2 , and w3 are the weights to evaluate the comparative impor- selected by vehicle v. If task s is in Sv , then λsv = 1, otherwise
tance of different parameters. Hence, they are all nonnegative, λsv = 0. Constraint C1 is to ensure that the bid prices of all
and their sum is 1. selected vehicles do not exceed the budget, and constraint C2 is
The AHP [13], a high-efficiency model to calculate the rela- to guarantee that any published task is included in the task set
tive ranking, is very appropriate for our relative satisfaction rank- covered by at least one vehicle.
ing determination problem. Therefore, we employ the AHP [13] Next, we analyze the MSBS problem’s NP-hardness. Then,
to calculate w1 , w2 , and w3 . X1 , X2 , and X3 are used to an optimized algorithm is developed to work it out.
denote the time consumption, the bid price, and the covering task Theorem 1: This MSBS problem is NP-hard.
numbers of the path, respectively. W = (w1 , w2 , w3 )T is the Proof: We focus on a simple scenario with only one task s.
weights vector. The pairwise comparison matrix U = (uij )3×3 The MCS platform’s objective is to improve its satisfaction by
is adopted [39] to measure the comparative importance between choosing a subset of V for task s, provided that the total bid
different parameters, in which the numerical values are usually prices of vehicles chosen for task s cannot exceed the budget B.
given by specialists and vary according to different scenarios, Then, the MSBS problem can be formulated as follows:
and each element uij expresses the importance of Xi versus 
P 3 : max θv SATv
Xj . When uij > 1, then Xi is more important than Xj . When θ v∈V
uij < 1, Xi is less important than Xj . uij = 1 when Xi is as (15)
s.t. θv Bv ≤ B.
important as Xj and uij × uji = 1. In AHP, the appropriate v∈V
integer can be selected between 1 and 9 for uij based on According to Kellerer et al. [40], P 3 is an NP-hard knapsack
the comparative importance of different criteria in the actual problem. Therefore, P 2 is at least NP-hard.
scenario. Next, a high-efficiency task distribution algorithm, SBCT,
U = (uij )3×3 in Fig. 4(a) is a simple example. For example, is designed, which makes the utmost of the limited budget
u12 = 3 manifests that the time consumption X1 of the path is to achieve great satisfaction. Algorithm 1 presents a detailed
more important than the bid price X2 . Then, each column of description of the designed SBCT algorithm.
U is normalized to obtain the normalized U shown in Fig. 4(b), Step 1: For each task s, for each vehicle, if the current task
u
i.e. ūij = 3 ij u . Then, W = (w1 , w2 , w3 )T is computed by can be covered by the path uploaded by the current vehicle,
n=1 nj

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 533

Algorithm 1: Optimized Allocation SBCT. Algorithm 2: Training Process of MADDPG-Based Path


Selection Algorithm.

then incorporate the current vehicle into the set of alternative


vehicles for the current task. If there is only one vehicle in the
set of available vehicles for the current task, it is called a critical
vehicle. If the critical vehicle is not currently selected and the
platform has enough remaining budget, the critical vehicle is
selected. Then, the remaining budget of the platform and the set
of tasks already covered are updated accordingly based on the
bid prices of the critical vehicles and the set of tasks covered by
their uploaded paths.
Step 2: For each unassigned task, select a vehicle from the budget constraint until the remaining budget is not enough to
set of currently unselected vehicles that can cover the current select any unselected bidding vehicles. Hence, we obtain the
task and maximize the ratio of satisfaction to bidding price MCS platform’s task allocation solution, namely, θv ∀v ∈ V.
among all vehicles whose unselected and uploaded paths can After determining the task allocation strategy, the chosen
cover the current task. Then, the remaining budget and the task vehicles carry out the assigned tasks and submit the collected
set included in the uploading route of the vehicles that have results. Then, the MCS platform allocates the total bid prices of
been selected are updated according to the quoted prices of the the selected vehicles according to satisfaction [41] via
currently selected vehicles and the set of tasks covered by their
selected paths. SATv
Step 3: The remaining budget is updated according to the pav =  × θv Bv (16)
bid prices of vehicles in the currently selected vehicle set, and v∈V θv SATv v∈V
then the unselected vehicles are sorted in descending order
according to the ratio of satisfaction to bid price. After sorting, where pav is the payoff paid by the platform to vehicle v, v ∈ J .
the unselected vehicles are selected in turn within the remaining For the platform, each vehicle only uploads information about

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
534 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024

its selected path, so here, we represent pavk as pav . For vehicle


v, pavk = pav when k = ξvt , otherwise, pavk = 0.
At each sensing round t, each vehicle v updates its observation
according to the task allocation results and then adjusts its
next path selection strategy. With continuous adjustments, the
vehicles can finally learn good strategies to maximize their net
income. Algorithm 2 summarizes the training process of the
MADDPG-based path selection algorithm. First, the algorithm
initializes the parameters of the neural network in lines 1 and
2. Also, the experience replay pool is initialized in line 3. For
each episode, the noise of the action is generated according to
a random process, and initial observations are obtained in lines
5 and 6. In each sensing round of the episode, each vehicle
feeds observations into the actor network, and obtains actions
based on the output of the actor network and behavioral noise
in line 8. After getting the action, each vehicle calculates the
path index according to (5) in line 9. After that, each vehi- Fig. 5. Map of trajectory coverage based on the Beijing taxi dataset.
cle summarizes the information about its selected path into
βv = {Pkv , Skv , bvk , wtvk , trkv } and uploads it to the platform for
bidding. After receiving the bids uploaded by the vehicles, based TABLE III
SIMULATION PARAMETER SETTINGS
on the bid prices, the time consumption, and the status of cov-
ering tasks on the paths submitted by the vehicles, the platform
calculates the satisfaction level that each bidding vehicle can
provide according to (11). Algorithm 1 is used to solve the task
allocation problem in (14) of maximizing satisfaction in line
10. After selecting the appropriate vehicles from the bidding
vehicles, the platform notifies the winning vehicles. The selected
vehicles reach the task locations and upload the task results,
and then the platform will reward these vehicles according to
(16) in lines 11 and 12. The vehicles calculate the actual reward
for their path selection action according to (6) based on the
received payment, update the states, and obtain new observation
results in line 13. Compared with the previous sensing round,
the observation results of the first L sensing rounds will be
updated. Then, store (ot , at , rt , ot+1 ) into the buffer pool in line dataset given by the authors in [42] and [43], which contains
14. Each vehicle updates its actor and critic network based on nearly 15 million trajectory points for 10 357 taxis. Fig. 5 shows
the sampling experiences in lines 16–20. the vehicle trajectory of Beijing city. Due to a large number
of records, for the convenience of experimental research, we
C. Computational Complexity Analysis choose a specific urban area from latitude 39.9074◦ to 39.9192◦
and longitude from 116.4287◦ to 116.4444◦ [10]. For this spe-
According to the task allocation process of Algorithm 1, we cific area, we generate 15 tasks, and the latitude and longitude of
must traverse all tasks, and for each task, we must traverse all each task are located on the roads. In the selected area, we select
vehicles. For each current vehicle, we need to judge whether the 15 vehicles with longer trajectories. For each selected vehicle,
current task is in the set of tasks passed by the current vehicle. we choose two trajectory points as the current location and
Therefore, the complexity of Algorithm 1 is O(S 2 × V ). destination, respectively, and all paths from the current location
In Algorithm 2, the actor network is fully connected, assuming to the destination satisfying the conditions in this area are viewed
that the number of neurons in the xth layer is Nxa , then the as optional paths. According to the optional paths and location
a
time complexity of the xth layer is O(Nx−1 Nxa + Nxa Nx+1
a
). information of the tasks, the proposed MADDPG-based path
Therefore, the overall time complexity of an actor network selection algorithm is used to choose the appropriate path for

consisting of X layers is O( X−1 a a a a
x=2 (Nx−1 Nx + Nx Nx+1 )). each vehicle.
During testing, only the trained actor networks need to be 2) Experimental Parameters: In our simulations, each ve-
used to select actions, so the time complexity mainly depends hicle’s actor network and critic network consist of one input
on the forward propagation operations of the actor networks. layer, three hidden layers, and one output layer. 512, 256, and
Therefore, the time complexity of the algorithm during testing 32 neurons exist in three hidden layers, respectively. Other
X−1  a a a a
 2 simulation parameter settings are given in Table III.
is O V × T × x=2 Nx−1 Nx + Nx Nx+1 + S .
3) Experimental Comparison: Our MADDPG and SBCT
mechanism is compared with the following baselines.
VI. PERFORMANCE EVALUATION 1) Brute force scheme (BFS): BFS explores all feasible cases
of path selection for each vehicle and task assignment
A. Simulation Setup
for the platform, thereby the global optimum strategy is
1) Experimental Dataset: In order to make the experiment obtained. Although this algorithm can obtain the optimal
more credible, we use the T-Drive Beijing vehicle trajectory solution, it is obvious that the computational complexity

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 535

Fig. 6. Convergence of the proposed MADDPG and SBCT incentive mechanism. (a) Path selection of each vehicle. (b) Net income of each vehicle. (c) Total net
income of selected vehicles. (d) Selection of platform for each vehicle. (e) Satisfaction of platform. (f) Total bid price of selected vehicles versus budget.

of this algorithm is extremely high. In the worst case, the will tend to select fewer vehicles in Fig. 6(b) and (d) because the
time complexity of each vehicle traversing all its optional platform favors vehicles that supply extensive task coverage, low
paths is O((lv )V ), and the time complexity of the platform bid price, and low time consumption. Therefore, each vehicle
traversing all optional vehicles is O(2V ). Therefore, the must compete with others and dynamically choose its path to
time complexity of the BFS scheme is O((lv )V + 2V ). keep a relatively high net income in Fig. 6(b). After about 900
2) Greedy: Each vehicle greedily selects a path covering iterations, the strategies of each vehicle and platform all tend
the largest number of tasks. The MCS platform greedily to be stable, each vehicle obtains a net income no less than the
chooses the vehicles with the maximum satisfaction within initial net income in Fig. 6(b), and the platform’s satisfaction
the budget. reaches a high level in Fig. 6(e). Although there are fluctuations
3) Random: Each moving vehicle randomly chooses a path in the individual vehicles’ net income, the total net income
from available paths, and the MCS platform randomly gradually increases with iterations and eventually stabilizes in
decides the vehicles to perform the tasks according to the Fig. 6(c). As expected, the total bid prices of all selected vehicles
budget. converge to the platform’s budget in Fig. 6(f), which shows the
4) Rational [44]: Allocation rationality is calculated by con- MADDPG and SBCT mechanism can take full advantage of the
sidering geographical information and task characteristics budget.
(i.e., route distance, task similarity, and task priority).
Assign a set of task locations to a set of workers and C. Comparison With Existing Algorithms
generate location access sequences to achieve the greatest
allocation rationality. Apart from the total net income and the satisfaction of
the platform, we introduce two additional indicators to as-
B. Behavior of the Proposed MADDPG and SBCT sess the performance, namely, task completion rate and the
platform’s average remaining budget, where the former is the
Incentive Mechanism proportion of the accomplished task numbers to the total pub-
In Fig. 6, the astringency of the proposed MADDPG and lished task numbers S, the latter is the average difference be-
SBCT mechanism is comprehensively demonstrated. In the be- tween the platform’s budget and the selected vehicles’ sum bid
ginning, each vehicle explores a different path selection strategy price.
in Fig. 6(a). As a countermeasure, the platform dynamically Fig. 7 presents the influences of the platform’s budget on
adjusts its selection for each vehicle in Fig. 6(d), therefore performance. We set V and S to 10 and 15, respectively. Fig. 7(a)
resulting in large fluctuations in each vehicle’s net income in and (b) illustrate that the total net income of vehicles and the
Fig. 6(b) and the platform’s satisfaction in Fig. 6(e). If the paths platform’s satisfaction of all algorithms increase with the budget
uploaded by the vehicles do not match each task, the platform because the total bid prices of the selected vehicles converge to

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
536 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024

Fig. 7. Comparison of the proposed MADDPG and SBCT incentive mechanism and others on the total net income, satisfaction of platform, task completion rate,
and average remaining budget under different platform budgets.

Fig. 8. Comparison of the proposed MADDPG and SBCT incentive mechanism and others on the total net income, satisfaction of platform, task completion rate,
and average remaining budget under different task numbers.

the platform’s budget, so as the budget increases, all algorithms Supposing that B = 45, and V = 10 or 15, Fig. 8 illustrates
can be recruiting more vehicles. In Fig. 7(c), the task completion the influence of increasing the task numbers on performance.
rates of the proposed MADDPG and SBCT mechanism and Fig. 8(a) and (b) illustrate that with the change in the task
the BFS algorithm can reach 100% and are even higher than numbers, the general vehicles’ net income and the platform
those of the random, greedy, and rational algorithms under satisfaction of all algorithms do not change much because the
any budget. This is because that the proposed MADDPG and total bid prices of all selected vehicles converge to the platform’s
SBCT mechanism through the path selection strategy based on budget, if the budget remains unchanged, the selected vehicles
MADDPG ensures each vehicle learn a good strategy and make are relatively stable in case of the same vehicles. However, we
the utmost of the budget to assign as many tasks as possible. For can find that the platform satisfaction of the same algorithm
the greedy algorithm, each vehicle chooses a path that covers having the same tasks is generally higher when V is 15 than
more tasks to obtain more profits, which results in some task when V is 10 from Fig. 8(b). This is because, under the same
locations not being covered by any vehicle’s path, so the task budget, when the number of vehicles is larger, the platform
completion rate of the greedy algorithm is relatively low. For the has the opportunity to select vehicles with more task coverage,
rational algorithm, the platform chooses the appropriate route lower bid prices, and shorter time consumption, which improves
for each vehicle according to the distance between the current the platform’s satisfaction and task completion rate. With the
task position and the previous task position in each alternative tasks’ increase, the random, greedy, and rational algorithms’
path and the priority of the tasks, so it can achieve a higher task task completion rates decrease in Fig. 8(c), and accordingly, the
completion rate than the greedy algorithm, which blindly selects average remaining budgets of the random, greedy, and rational
the path covering more tasks for each vehicle. However, because algorithms increase in Fig. 8(d). This is because that with the
the rational algorithm ignores the attention to the time cost and number of tasks increases, the paths uploaded by all bidding
price of the paths, when the platform budget is limited, and each vehicles in the random, greedy, and rational algorithms are
vehicle has a clear deadline to the destination, its task completion difficult to cover all tasks, resulting in task allocation failure.
rate cannot show good performance compared with the proposed However, since the rational algorithm chooses the path with the
MADDPG and SBCT mechanism. Fig. 7(d) illustrates that the location of adjacent tasks as close as possible, its task completion
average remaining budget of the proposed MADDPG and SBCT rate decreases more slowly as the number of tasks increases
mechanism is less than those of the rational, greedy, and random than those of the greedy and random algorithms. Fig. 9 shows
algorithms because our mechanism can improve the platform’s the influence of changes in vehicle number on performance. In
satisfaction by taking full advantage of the budget. In general, Fig. 9, with the increasing number of vehicles, the change in total
under different budgets, the proposed incentive mechanism is net income for all algorithms is little, while the satisfaction of
better than other baseline schemes in terms of total net income, the platform and task completion rate of all algorithms are on the
platform satisfaction, and task completion rate and can make rise. Because when the platform’s budget is unchanged, the total
fuller use of the platform budget. For example, under different bid prices of all selected vehicles change little. From (16), we
platform budgets, when the number of vehicles is ten, and know that the total net income converges to the total bid prices
the number of tasks is ten, the total net income of vehicles, of all selected vehicles, so the total net income changes little.
platform satisfaction, and task completion rate of the proposed Furthermore, with the increase in vehicle numbers, vehicles with
MADDPG and SBCT incentive mechanism are, respectively, more task coverage and lower bid prices can be selected, which
12.01%, 18.57%, and 30.47% higher than those of the rational makes the platform satisfaction and task completion rate both
algorithm. increase, and correspondingly, the average remaining budget of

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 537

Fig. 9. Comparison of the proposed MADDPG and SBCT incentive mechanism and others on the total net income, satisfaction of platform, task completion rate,
and average remaining budget under different vehicle numbers.

the platform with random, greedy and rational algorithms shows [8] B. Gu, X. Yang, Z. Lin, W. Hu, M. Alazab, and R. Kharel, “Multiagent
a downward trend. actor-critic network-based incentive mechanism for mobile crowdsens-
ing in industrial systems,” IEEE Trans. Ind. Informat., vol. 17, no. 9,
pp. 6182–6191, Sep. 2021.
VII. CONCLUSION [9] Y. Zhao and C. H. Liu, “Social-aware incentive mechanism for vehicular
crowdsensing by deep reinforcement learning,” IEEE Trans. Intell. Transp.
This article researched the vehicle path selection problem Syst., vol. 22, no. 4, pp. 2314–2325, Apr. 2021.
and the sensing tasks allocation problem under budget con- [10] X. Chen, L. Zhang, Y. Pang, B. Lin, and Y. Fang, “Timeliness-aware
incentive mechanism for vehicular crowdsourcing in smart cities,” IEEE
straints in the MCS. We proposed a distributed path selection Trans. Mobile Comput., vol. 21, no. 9, pp. 3373–3387, Sep. 2022.
algorithm based on MADRL, in which each vehicle observed [11] X. Li and X. Zhang, “Multi-task allocation under time constraints in
its transaction records and adjusted its path selection strategy mobile crowdsensing,” IEEE Trans. Mobile Comput., vol. 20, no. 4,
iteratively by interacting with the surrounding environment. The pp. 1494–1510, Apr. 2021.
[12] L. Wang, Z. Yu, D. Zhang, B. Guo, and C. H. Liu, “Heterogeneous multi-
indicator of the satisfaction level was introduced to character- task assignment in mobile crowdsensing using spatiotemporal correlation,”
ize the satisfaction level that each vehicle can provide to the IEEE Trans. Mobile Comput., vol. 18, no. 1, pp. 84–97, Jan. 2019.
platform by considering the time consumption, the bid price, [13] Y. Huang et al., “OPAT: Optimized allocation of time-dependent tasks
and the number of covering tasks on the path that each vehicle for mobile crowdsensing,” IEEE Trans. Ind. Informat., vol. 18, no. 4,
uploads to the platform. Then, we proposed the MSBS problem pp. 2476–2485, Apr. 2022.
[14] J. Wang, X. Feng, T. Xu, H. Ning, and T. Qiu, “Blockchain-based model for
under budget constraints, which is proven to be NP-hard, and nondeterministic crowdsensing strategy with vehicular team cooperation,”
designed a high-efficiency task allocation algorithm SBCT that IEEE Internet Things J., vol. 7, no. 9, pp. 8090–8098, Sep. 2020.
can make the utmost of the budget of the platform to maxi- [15] Y. Liu, N. Chen, X. Zhang, X. Liu, Y. Yi, and N. Zhao, “Research on
mize satisfaction. We compared the proposed MADDPG and multi-task assignment model based on task similarity in crowdsensing,”
in Proc. IEEE Int. Conf. Commun., 2021, pp. 523–528.
SBCT incentive mechanism with the optimal results obtained [16] Y. Liu, H. Wang, M. Peng, J. Guan, and Y. Wang, “An incentive mechanism
by the BFS algorithm, rational algorithm, greedy algorithm, and for privacy-preserving crowdsensing via deep reinforcement learning,”
random algorithm. Abundant experimental results manifested IEEE Internet Things J., vol. 8, no. 10, pp. 8616–8631, May 2021.
that both the platform and vehicles learn strategies that could [17] Y. Pei, G. Zhang, F. Hou, and G. Yang, “Online optimal algorithm design
attain approximatively optimal returns. In future work, we will for mobile crowdsensing with dual-role users,” in Proc. IEEE Veh. Technol.
Conf., 2021, pp. 1–5.
further consider protecting user privacy and design more effec- [18] D. Deng, C. Shahabi, and U. Demiryurek, “Maximizing the number of
tive mechanisms to cope with highly dynamic environments and worker’s self-selected tasks in spatial crowdsourcing,” in Proc. ACM
meet stringent real-time requirements. Sigspatial Int. Conf. Adv. Geographic Inf. Syst., 2013, pp. 324–333.
[19] M. H. Cheung, F. Hou, and J. Huang, “Delay-sensitive mobile crowd-
sensing: Algorithm design and economics,” IEEE Trans. Mobile Comput.,
REFERENCES vol. 17, no. 12, pp. 2761–2774, Dec. 2018.
[20] W. Li, B. Jia, H. Xu, Z. Zong, and T. Watanabe, “A multi-task scheduling
[1] P. Arthurs, L. Gillam, P. Krause, N. Wang, K. Halder, and A. Mouzakitis, mechanism based on ACO for maximizing workers’ benefits in mobile
“A taxonomy and survey of edge cloud computing for intelligent trans- crowdsensing service markets with the Internet of Things,” IEEE Access,
portation systems and connected vehicles,” IEEE Trans. Intell. Transp. vol. 7, pp. 41463–41469, 2019.
Syst., vol. 23, no. 7, pp. 6206–6221, Jul. 2022. [21] X. Tao and W. Song, “Location-dependent task allocation for mobile
[2] R. Gao, F. Sun, W. Xing, D. Tao, J. Fang, and H. Chai, “CTTE: Customized crowdsensing with clustering effect,” IEEE Internet Things J., vol. 6, no. 1,
travel time estimation via mobile crowdsensing,” IEEE Trans. Intell. pp. 1029–1045, Feb. 2019.
Transp. Syst., vol. 23, no. 10, pp. 19335–19347, Oct. 2022. [22] G. Gao, H. Huang, M. Xiao, J. Wu, Y. -E. Sun, and Y. Du, “Bud-
[3] X. Zhu, Y. Luo, A. Liu, W. Tang, and M. Z. A. Bhuiyan, “A deep learning- geted unknown worker recruitment for heterogeneous crowdsensing using
based mobile crowdsensing scheme by predicting vehicle mobility,” IEEE CMAB,” IEEE Trans. Mobile Comput., vol. 21, no. 11, pp. 3895–3911,
Trans. Intell. Transp. Syst., vol. 22, no. 7, pp. 4648–4659, Jul. 2021. Nov. 2022.
[4] X. Liu, W. Chen, Y. Xia, and R. Shen, “TRAMS: A secure vehicular crowd- [23] G. Gao, M. Xiao, J. Wu, L. Huang, and C. Hu, “Truthful incentive mech-
sensing scheme based on multi-authority attribute-based signature,” IEEE anism for nondeterministic crowdsensing with vehicles,” IEEE Trans.
Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 12790–12800, Aug. 2022. Mobile Comput., vol. 17, no. 12, pp. 2982–2997, Dec. 2018.
[5] P. Mohan, V. N. Padmanabhan, and R. Ramjee, “Nericell: Rich monitoring [24] Y. Hui et al., “BCC: Blockchain-based collaborative crowdsensing in
of road and traffic conditions using mobile smartphones,” in Proc. ACM autonomous vehicular networks,” IEEE Internet Things J., vol. 9, no. 6,
SenSys, 2008, pp. 323–336. pp. 4518–4532, Mar. 2022.
[6] P. Dutta et al., “Common sense: Participatory urban sensing using a net- [25] K. Wei et al., “High-performance UAV crowdsensing: A deep rein-
work of handheld air quality monitors,” in Proc. 7th ACM Conf. Embedded forcement learning approach,” IEEE Internet Things J., vol. 9, no. 19,
Netw. Sensor Syst., 2009, pp. 349–350. pp. 18487–18499, Oct. 2022.
[7] I. Schweizer et al., “Noisemap: Multi-tier incentive mechanisms for par- [26] W. Tan, L. Zhao, B. Li, L. Xu, and Y. Yang, “Multiple cooperative task
ticipative urban sensing,” in Proc. 3rd Int. Workshop Sens. Appl. Mobile allocation in group-oriented social mobile crowdsensing,” IEEE Trans.
Phones, 2012, pp. 1–5. Serv. Comput., vol. 15, no. 6, pp. 3387–3401, Nov./Dec. 2022.

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
538 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024

[27] B. Tang, Z. Li, and K. Han, “Multi-agent reinforcement learning for mobile Miao Ma received the M.E. degree in technology of
crowdsensing systems with dedicated vehicles on road networks,” in Proc. computer application from the Xi’an University of
IEEE Int. Intell. Transp. Syst. Conf., 2021, pp. 3584–3589. Science and Technology, Xi’an, China, in 2002, and
[28] L. Liu et al., “Evenness-aware data collection for edge-assisted mobile the Ph.D. degree in signal and information process-
crowdsensing in internet of vehicles,” IEEE Internet Things J., vol. 10, ing degree from Northwest Polytechnic University,
no. 1, pp. 1–16, Jan. 2023. Xi’an, in 2005.
[29] B. Cao, S. Xia, J. Han, and Y. Li, “A distributed game methodology As a Postdoctoral Researcher, she did research
for crowdsensing in uncertain wireless scenario,” IEEE Trans. Mobile work with Northwestern Polytechnical University,
Comput., vol. 19, no. 1, pp. 15–28, Jan. 2020. from 2006 to 2009. She is currently a Professor
[30] M. Li, M. Ma, L. Wang, B. Yang, T. Wang, and J. Sun, “Multitask- with the School of Computer Science, Shaanxi Nor-
oriented collaborative crowdsensing based on reinforcement learning and mal University, Xi’an. Her research interests in-
blockchain for intelligent transportation system,” IEEE Trans. Ind. Infor- clude image processing, video analysis on educational big data, and mobile
mat., vol. 19, no. 9, pp. 9503–9514, Sep. 2023. crowdsensing.
[31] H. Wang, Y. Yang, E. Wang, W. Liu, Y. Xu, and J. Wu, “Truthful user
recruitment for cooperative crowdsensing task: A combinatorial multi-
armed bandit approach,” IEEE Trans. Mobile Comput., vol. 22, no. 7,
pp. 4314–4331, Jul. 2023.
[32] H. Wang, C. H. Liu, H. Yang, G. Wang, and K. K. Leung, “Ensuring Liang Wang (Member, IEEE) received the B.S.
threshold AoI for UAV-assisted mobile crowdsensing by multi-agent deep degree in telecommunications engineering and the
reinforcement learning with transformer,” IEEE/ACM Trans. Netw., early Ph.D. degree in communication and information sys-
access, Jul. 12, 2023, doi: 10.1109/TNET.2023.3289172. tems from Xidian University, Xi’an, China, in 2009
[33] X. Dong, Z. You, T. H. Luan, Q. Yao, Y. Shen, and J. Ma, “Optimal mobile and 2015, respectively.
crowdsensing incentive under sensing inaccuracy,” IEEE Internet Things From 2018 to 2019, he was a Visiting Scholar
J., vol. 8, no. 10, pp. 8032–8043, May 2021. with the School of Electrical and Computer Engineer-
[34] K. Jiang et al., “A reinforcement learning-based incentive mech- ing, Georgia Institute of Technology, Atlanta, GA,
anism for task allocation under spatiotemporal crowdsensing,” USA. He is currently an Associate Professor with the
IEEE Trans. Comput. Social Syst., early access, Apr. 10, 2023, School of Computer Science, Shaanxi Normal Uni-
doi: 10.1109/TCSS.2023.3263821. versity, Xi’an. His research interests include Internet
[35] G. Ji, B. Zhang, G. Zhang, and C. Li, “Online incentive mechanisms for of Things, mobile edge computing, and applications of reinforcement learning.
socially-aware and socially-unaware mobile crowdsensing,” IEEE Trans.
Mobile Comput., to be published, doi: 10.1109/TMC.2023.3321701.
[36] C. Xu and W. Song, “Decentralized task assignment for mobile crowdsens-
ing with multi-agent deep reinforcement learning,” IEEE Internet Things
J., vol. 10, no. 18, pp. 16564–16578, Sep. 2023. Zhao Pei (Member, IEEE) received the B.E., M.S.,
[37] S. Huang et al., “Gather or scatter: Stackelberg game based task deci- and Ph.D. degrees from Northwestern Polytechnical
sion for blockchain-assisted socially-aware crowdsensing framework,” University, Xi’an, China, in 2005, 2008, and 2013,
IEEE Internet Things J., vol. 11, no. 2, pp. 1939–1951, Jan. 2024, respectively.
doi: 10.1109/JIOT.2023.3284477. From 2010 to 2011, he was a joint Ph.D. Stu-
[38] R. Ding, Y. Xu, F. Gao, and X. Shen, “Trajectory design and access dent with the Department of Computing Science,
control for air–ground coordinated communications system with multi- University of Alberta, Edmonton, AB, Canada. He
agent deep reinforcement learning,” IEEE Internet Things J., vol. 9, no. 8, is currently a Professor with the School of Com-
pp. 5785–5798, Apr. 2022. puter Science, Shaanxi Normal University, Xi’an.
[39] J. Hu et al., “Towards demand-driven dynamic incentive for mobile His research interests include camera array synthetic
crowdsensing systems,” IEEE Trans. Wireless Commun., vol. 19, no. 7, aperture imaging, object detection and tracking, and
pp. 4907–4918, Jul. 2020. human body motion analysis.
[40] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems. Berlin
Germany: Springer, 2004.
[41] F. Li, X. Li, Y. Fu, P. Zhao, and S. Liu, “A secure and pri-
vacy preserving incentive mechainism for vehicular crowdsensing with Jie Ren (Member, IEEE) received the Ph.D. degree
data quality assurance,” in Proc. IEEE Veh. Technol. Conf., 2021,
in computer architecture from Northwest University,
pp. 1–5.
Xi’an, China, in 2017.
[42] F. Li, Y. Fu, P. Zhao, and C. Li, “An incentive mechanism for nondetermin-
He is currently an Assistant Professor with the
istic vehicular crowdsensing with blockchain,” in Proc. IEEE Int. Conf. Computer Science Department, Shaanxi Normal Uni-
Commun., 2020, pp. 1074–1079.
versity, Xi’an, China. His research interests include
[43] J. Yuan, Y. Zheng, C. Zhang, W. Xie, and Y. Huang, “T-drive: Driving
on mobile system optimization, runtime schedul-
directions based on taxi trajectories,” in Proc SIGSPATIAL Int. Conf. Adv. ing, and contrastive learning in natural language
Geographic Inf. Syst., 2010, pp. 99–108.
processing.
[44] B. Yin, J. Li, and X. Wei, “Rational task assignment and path plan-
ning based on location and task characteristics in mobile crowdsens-
ing,” IEEE Trans. Comput. Social Syst., vol. 9, no. 3, pp. 781–793,
Jun. 2022.

Bo Yang received the M.E. degree in computer soft


and the Ph.D. degree in cryptography from Xid-
Mengge Li (Student Member, IEEE) received the ian University, Xi’an, China, in 1993 and 1999,
B.E. degree in information management and infor- respectively.
mation systems from Shangqiu Normal University, He is currently a Professor with the School of Com-
Shangqiu, China, in 2019. She is currently working puter Science, Shaanxi Normal University, Xi’an.
toward the Ph.D. degree in computer science and His research interests include information theory and
technology with Shaanxi Normal University, Xi’an. cryptography.
Her research interests include applications of rein- Dr. Yang has served as a Program Chair for the
forcement learning and mobile crowdsensing. Fourth China Conference on Information and Com-
munications Security, and the General Chair for the
5th Joint Workshop on Information Security.

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.

You might also like