Multiagent Deep Reinforcement Learning Based Incentive Mechanism For Mobile Crowdsensing in Intelligent Transportation Systems
Multiagent Deep Reinforcement Learning Based Incentive Mechanism For Mobile Crowdsensing in Intelligent Transportation Systems
1937-9234 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
528 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024
motivation of the vehicles to participate in the tasks is very low 2) By considering the bid price, time consumption, and task
unless satisfactory compensation can be obtained. Therefore, the coverage of the paths submitted by each vehicle, we refine
design of incentive mechanisms plays a vital role in stimulating the satisfaction that each vehicle can bring to the platform
vehicles to take part in sensing activities and submit high-quality using the analytic hierarchy process (AHP). Based on this,
results [8], [9], [10]. we propose the maximum-platform-satisfaction winning-
Most previous works about task assignment and incentive bid selection (MSBS) problem proved to be NP-hard in
mechanisms focus on individual tasks, where each task is done MCS.
independently. In reality, multiple tasks are often coupled with 3) A high-efficiency task allocation algorithm, namely, the
each other in common resource collections. To realize the full optimized allocation strategy of budget-constrained tasks
utilization of resources, research works [11], [12] on multi- (SBCT), is proposed, which is expected to make the utmost
task allocation have attracted much attention recently, but task of the budget and maximize platform satisfaction. The
characteristics are still ignored. In practice, clustering features vehicles that perform the tasks are rewarded based on
in terms of geographic location distribution (such as gathering satisfaction to compensate for their cost.
information on parking lots near department stores) are common 4) We perform extensive simulations on the actual vehicle
in scenarios with a number of tasks. We call such tasks clustering trajectory dataset. Simulation experiments demonstrate
tasks. When these tasks are handled uniformly, the effectiveness the proposed MADDPG and SBCT incentive mechanism
of task allocation will be greatly optimized. can achieve approximately optimal performance in terms
Furthermore, most available multitask allocation strategies of the total net income, the platform’s satisfaction, the task
either aim at the interests of the platform, such as maximizing completion rate, and the average remaining budget.
the utility or welfare of the platform [11], [13], [14], [15], [16], The rest of this article is organized as follows. Related work
[17], or optimize the interests of the participants, for instance, is summarized in Section II. Section III introduces the system
improving the schedule of performed tasks or the benefit of the model and formulates the vehicle path selection problem. We
participants [8], [9], [18], [19], [20], [21]. However, only when describe the proposed MADDPG-based path selection strategy
the platform’s and participants’ interests are simultaneously in Section IV. The MSBS problem and the proposed optimized
considered, the overall interests of the entire system could be task allocation strategy are given in Section V. Extensive exper-
improved to the fullest extent. Meanwhile, path planning is iments in Section VI evaluate the proposed strategies. Finally,
crucial for maximizing the benefits of the vehicle participants. Section VII concludes this article.
Nevertheless, the available MCS schemes in ITS ignore the study
on vehicle path planning.
Individual optimization goals, such as improving the quality II. RELATED WORK
of sensing results [22], reducing expenditure [23], [24], or
TIs, the platform, and participants are the three key roles
delaying, only apply to the requirements of some specific appli-
in MCS. Many available task allocation schemes and incentive
cations. For example, only considering improving the quality of
mechanisms have been proposed based on the interests of TIs,
sensing results may cause large waiting delays and recruitment
the platform, and participants, respectively.
costs, which is intolerable for most other applications on the
platform. Therefore, during task assignment, it is meaningful to
simultaneously consider the bid price, time consumption, and
task coverage to optimize the monetary expenditure, the time A. Incentive Mechanism Based on the TIs
consumption for completing the tasks, and the communication From the perspective of the TIs, Wang et al. [12] designed
cost of the system. a two-stage task distribution approach to improve the quality
To overcome these above-mentioned research gaps, we design of sensing results and economize the budget. Hui et al. [24]
a two-stage incentive mechanism based on task composition constructed a centerless collaborative MCS framework using
performed on geographic location to optimize the interests blockchain and a coalition formation algorithm to minimize
of vehicles and the platform simultaneously. First, the article the cost. Tan et al. [26] developed a three-stage framework
maximizes the net income of vehicles by selecting appropriate that exploits realistic relationships in social networks to form
paths for each vehicle based on multiagent deep reinforcement groups to improve task coverage and cooperation quality. Tang
learning (MADRL) proved to be very effective in vehicle path et al. [27] developed an algorithm using reinforcement learning
planning [25]. Then, the satisfaction of the platform as a com- (RL) to improve spatial and temporal coverage. Liu et al. [28]
bination of the bid price, time consumption, and the number of developed a cooperative information sensing framework uti-
covering tasks is maximized by assigning the clustering tasks to lizing edge intelligence to improve spatial-temporal evenness.
appropriate bidding vehicles. Based on the optimization of the Cao et al. [29] developed an MCS-oriented motivation strategy
above two points, the global interests of the whole MCS system to encourage nearby participants to share their resources and
can be maximized. In detail, the main contributions of the article a task migration method to optimize the utility. Li et al. [30]
are as follows. proposed concurrent tasks and safety emergency task allocation
1) We develop the framework utilizing multi-agent deep methods based on RL to optimize the utility of concurrent tasks
deterministic policy gradient (MADDPG), a typical based on meeting the requirements of safety emergency tasks.
MADRL technology that is suitable to the rapidly chang- Wang et al. [31] designed a participant recruitment scheme
ing surrounding traffic and the uncertainty of parameters, with multiple stages using the combinatorial multiarmed bandit
to make each vehicle learn the optimal path selection algorithm to optimize the task execution effect. Wang et al. [32]
policy to maximize its net income through constant ob- proposed a distributed MADRL framework for multiunmanned
servation and adjustment. aerial vehicle trajectory planning to jointly minimize the
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 529
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
530 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024
TABLE II
SUMMARY OF NOTATION
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 531
B. Action Space
At sensing round t, each vehicle v observes the decisions of
the platform in the first L sensing rounds and then gives its path
selection action at the current sensing round on the basis of the
output of the actor network
atv = μv otv |ϕμv . (4)
Therefore, the path selected by vehicle v at sensing round t is
defined as
ξvt = atv lv (5)
where x is used to obtain the result of rounding up x.
Fig. 3. MADDPG-based vehicle path selection architecture.
C. Reward Function
along the selected path does not exceed its deadline for reaching The reward that vehicle v obtains at sensing round t is denoted
the destination, and constraint C2 is to ensure that vehicle v as
selects at most one path. lv
In recent years, the advantages of MADDPG in path de- rvt = ζkv (t)θvt (pavk (t) − cvk (t)) (6)
sign have been confirmed [25]. Therefore, we use MAD- k=1
DPG to make each vehicle learn the optimal path selection
strategy. where ζkv (t) indicates whether vehicle v selects path Pkv at
sensing round t, pavk (t) is the payoff paid by the platform that
vehicle v can get by selecting path Pkv and executing task set Skv
IV. PROPOSED MADDPG-BASED PATH SELECTION STRATEGY at sensing round t, cvk (t) is the true cost of vehicle v for path Pkv
MADDPG [8], [38] is a typical pattern of machine learning, and tasks set Skv at sensing round t. θvt and pavk (t) are determined
which makes agents learn policies to maximize benefits. At by the platform’s task allocation strategy.
sensing round t, each vehicle, acting as an agent, outputs an
action atv after observing the environment otv . Later, it will get a D. MADDPG Networks Update
reward rvt , then continue to observe the changing environment
MADDPG includes a critic network Qv (o, a|ϕQv ) and an
and adjust its strategy, as shown in Fig. 3. In a system with
actor network μv (ov |ϕμv ) for each vehicle v, each actor net-
multiple vehicles, every vehicle’s policy will constantly change
work μv (ov |ϕμv ) has a corresponding target actor network
during training. For a vehicle, other vehicles are all part of the
environment, so the whole environment will become unstable, μv (ov |ϕμv ), and each critic network Qv (o, a|ϕQv ) has a target
which will lead to difficult convergence. To this end, we employ critic network Qv (o, a |ϕQv ). The critic network [8], [38] of
two critical techniques in MADDPG [8]. vehicle v is updated with
1) Centralized training and distributed execution [8]: Cen- G
1 g 2
tralized learning is utilized to train the critic and actor Lv = y v − Q v o g , a g | ϕQ v (7)
networks. During execution, each actor can run simply by G g=1
knowing local information.
2) Augmented critic [8]: The critic network of each vehicle where og = {og1 , . . ., ogV }, ag = {ag1 , . . ., agV }, G denotes the
is reinforced by speculating the policies of others. mini-batch size, and yvg is the objective value calculated by
For the above problem P 1, the state otv , action atv , and reward
rvt of each vehicle are denoted as follows. yvg = rvg + γQv o(g+1) , a(g+1) ϕQv (8)
paths to the destination for vehicle v, and M(P v ) = ϕμv ← τ ϕμv + (1 − τ ) ϕμv (10)
{{P1v , S1v , cv1 , bv1 , wtv1 , tr1v }, . . ., {Pkv , Skv , cvk , bvk , wtvk , trkv }, . . ., where τ is a soft update factor.
{Plvv , Slvv , cvlv , bvlv , wtvlv , trlvv }}, ξvt−1 represents the path index For each sensing round, after the selecting paths of the ve-
selected by vehicle v at sensing round t − 1, i.e., ξvt−1 = k (in hicles are determined, based on the information uploaded by
case of ζkv = 1 at the sensing round t − 1), and ψvt−1 indicates the vehicles, the MCS platform performs the task allocation to
the net income of vehicle v at the sensing round t − 1. determine θv and pavk .
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
532 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 533
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
534 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 535
Fig. 6. Convergence of the proposed MADDPG and SBCT incentive mechanism. (a) Path selection of each vehicle. (b) Net income of each vehicle. (c) Total net
income of selected vehicles. (d) Selection of platform for each vehicle. (e) Satisfaction of platform. (f) Total bid price of selected vehicles versus budget.
of this algorithm is extremely high. In the worst case, the will tend to select fewer vehicles in Fig. 6(b) and (d) because the
time complexity of each vehicle traversing all its optional platform favors vehicles that supply extensive task coverage, low
paths is O((lv )V ), and the time complexity of the platform bid price, and low time consumption. Therefore, each vehicle
traversing all optional vehicles is O(2V ). Therefore, the must compete with others and dynamically choose its path to
time complexity of the BFS scheme is O((lv )V + 2V ). keep a relatively high net income in Fig. 6(b). After about 900
2) Greedy: Each vehicle greedily selects a path covering iterations, the strategies of each vehicle and platform all tend
the largest number of tasks. The MCS platform greedily to be stable, each vehicle obtains a net income no less than the
chooses the vehicles with the maximum satisfaction within initial net income in Fig. 6(b), and the platform’s satisfaction
the budget. reaches a high level in Fig. 6(e). Although there are fluctuations
3) Random: Each moving vehicle randomly chooses a path in the individual vehicles’ net income, the total net income
from available paths, and the MCS platform randomly gradually increases with iterations and eventually stabilizes in
decides the vehicles to perform the tasks according to the Fig. 6(c). As expected, the total bid prices of all selected vehicles
budget. converge to the platform’s budget in Fig. 6(f), which shows the
4) Rational [44]: Allocation rationality is calculated by con- MADDPG and SBCT mechanism can take full advantage of the
sidering geographical information and task characteristics budget.
(i.e., route distance, task similarity, and task priority).
Assign a set of task locations to a set of workers and C. Comparison With Existing Algorithms
generate location access sequences to achieve the greatest
allocation rationality. Apart from the total net income and the satisfaction of
the platform, we introduce two additional indicators to as-
B. Behavior of the Proposed MADDPG and SBCT sess the performance, namely, task completion rate and the
platform’s average remaining budget, where the former is the
Incentive Mechanism proportion of the accomplished task numbers to the total pub-
In Fig. 6, the astringency of the proposed MADDPG and lished task numbers S, the latter is the average difference be-
SBCT mechanism is comprehensively demonstrated. In the be- tween the platform’s budget and the selected vehicles’ sum bid
ginning, each vehicle explores a different path selection strategy price.
in Fig. 6(a). As a countermeasure, the platform dynamically Fig. 7 presents the influences of the platform’s budget on
adjusts its selection for each vehicle in Fig. 6(d), therefore performance. We set V and S to 10 and 15, respectively. Fig. 7(a)
resulting in large fluctuations in each vehicle’s net income in and (b) illustrate that the total net income of vehicles and the
Fig. 6(b) and the platform’s satisfaction in Fig. 6(e). If the paths platform’s satisfaction of all algorithms increase with the budget
uploaded by the vehicles do not match each task, the platform because the total bid prices of the selected vehicles converge to
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
536 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024
Fig. 7. Comparison of the proposed MADDPG and SBCT incentive mechanism and others on the total net income, satisfaction of platform, task completion rate,
and average remaining budget under different platform budgets.
Fig. 8. Comparison of the proposed MADDPG and SBCT incentive mechanism and others on the total net income, satisfaction of platform, task completion rate,
and average remaining budget under different task numbers.
the platform’s budget, so as the budget increases, all algorithms Supposing that B = 45, and V = 10 or 15, Fig. 8 illustrates
can be recruiting more vehicles. In Fig. 7(c), the task completion the influence of increasing the task numbers on performance.
rates of the proposed MADDPG and SBCT mechanism and Fig. 8(a) and (b) illustrate that with the change in the task
the BFS algorithm can reach 100% and are even higher than numbers, the general vehicles’ net income and the platform
those of the random, greedy, and rational algorithms under satisfaction of all algorithms do not change much because the
any budget. This is because that the proposed MADDPG and total bid prices of all selected vehicles converge to the platform’s
SBCT mechanism through the path selection strategy based on budget, if the budget remains unchanged, the selected vehicles
MADDPG ensures each vehicle learn a good strategy and make are relatively stable in case of the same vehicles. However, we
the utmost of the budget to assign as many tasks as possible. For can find that the platform satisfaction of the same algorithm
the greedy algorithm, each vehicle chooses a path that covers having the same tasks is generally higher when V is 15 than
more tasks to obtain more profits, which results in some task when V is 10 from Fig. 8(b). This is because, under the same
locations not being covered by any vehicle’s path, so the task budget, when the number of vehicles is larger, the platform
completion rate of the greedy algorithm is relatively low. For the has the opportunity to select vehicles with more task coverage,
rational algorithm, the platform chooses the appropriate route lower bid prices, and shorter time consumption, which improves
for each vehicle according to the distance between the current the platform’s satisfaction and task completion rate. With the
task position and the previous task position in each alternative tasks’ increase, the random, greedy, and rational algorithms’
path and the priority of the tasks, so it can achieve a higher task task completion rates decrease in Fig. 8(c), and accordingly, the
completion rate than the greedy algorithm, which blindly selects average remaining budgets of the random, greedy, and rational
the path covering more tasks for each vehicle. However, because algorithms increase in Fig. 8(d). This is because that with the
the rational algorithm ignores the attention to the time cost and number of tasks increases, the paths uploaded by all bidding
price of the paths, when the platform budget is limited, and each vehicles in the random, greedy, and rational algorithms are
vehicle has a clear deadline to the destination, its task completion difficult to cover all tasks, resulting in task allocation failure.
rate cannot show good performance compared with the proposed However, since the rational algorithm chooses the path with the
MADDPG and SBCT mechanism. Fig. 7(d) illustrates that the location of adjacent tasks as close as possible, its task completion
average remaining budget of the proposed MADDPG and SBCT rate decreases more slowly as the number of tasks increases
mechanism is less than those of the rational, greedy, and random than those of the greedy and random algorithms. Fig. 9 shows
algorithms because our mechanism can improve the platform’s the influence of changes in vehicle number on performance. In
satisfaction by taking full advantage of the budget. In general, Fig. 9, with the increasing number of vehicles, the change in total
under different budgets, the proposed incentive mechanism is net income for all algorithms is little, while the satisfaction of
better than other baseline schemes in terms of total net income, the platform and task completion rate of all algorithms are on the
platform satisfaction, and task completion rate and can make rise. Because when the platform’s budget is unchanged, the total
fuller use of the platform budget. For example, under different bid prices of all selected vehicles change little. From (16), we
platform budgets, when the number of vehicles is ten, and know that the total net income converges to the total bid prices
the number of tasks is ten, the total net income of vehicles, of all selected vehicles, so the total net income changes little.
platform satisfaction, and task completion rate of the proposed Furthermore, with the increase in vehicle numbers, vehicles with
MADDPG and SBCT incentive mechanism are, respectively, more task coverage and lower bid prices can be selected, which
12.01%, 18.57%, and 30.47% higher than those of the rational makes the platform satisfaction and task completion rate both
algorithm. increase, and correspondingly, the average remaining budget of
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
LI et al.: MADRL BASED INCENTIVE MECHANISM FOR MOBILE CROWDSENSING IN INTELLIGENT TRANSPORTATION SYSTEMS 537
Fig. 9. Comparison of the proposed MADDPG and SBCT incentive mechanism and others on the total net income, satisfaction of platform, task completion rate,
and average remaining budget under different vehicle numbers.
the platform with random, greedy and rational algorithms shows [8] B. Gu, X. Yang, Z. Lin, W. Hu, M. Alazab, and R. Kharel, “Multiagent
a downward trend. actor-critic network-based incentive mechanism for mobile crowdsens-
ing in industrial systems,” IEEE Trans. Ind. Informat., vol. 17, no. 9,
pp. 6182–6191, Sep. 2021.
VII. CONCLUSION [9] Y. Zhao and C. H. Liu, “Social-aware incentive mechanism for vehicular
crowdsensing by deep reinforcement learning,” IEEE Trans. Intell. Transp.
This article researched the vehicle path selection problem Syst., vol. 22, no. 4, pp. 2314–2325, Apr. 2021.
and the sensing tasks allocation problem under budget con- [10] X. Chen, L. Zhang, Y. Pang, B. Lin, and Y. Fang, “Timeliness-aware
incentive mechanism for vehicular crowdsourcing in smart cities,” IEEE
straints in the MCS. We proposed a distributed path selection Trans. Mobile Comput., vol. 21, no. 9, pp. 3373–3387, Sep. 2022.
algorithm based on MADRL, in which each vehicle observed [11] X. Li and X. Zhang, “Multi-task allocation under time constraints in
its transaction records and adjusted its path selection strategy mobile crowdsensing,” IEEE Trans. Mobile Comput., vol. 20, no. 4,
iteratively by interacting with the surrounding environment. The pp. 1494–1510, Apr. 2021.
[12] L. Wang, Z. Yu, D. Zhang, B. Guo, and C. H. Liu, “Heterogeneous multi-
indicator of the satisfaction level was introduced to character- task assignment in mobile crowdsensing using spatiotemporal correlation,”
ize the satisfaction level that each vehicle can provide to the IEEE Trans. Mobile Comput., vol. 18, no. 1, pp. 84–97, Jan. 2019.
platform by considering the time consumption, the bid price, [13] Y. Huang et al., “OPAT: Optimized allocation of time-dependent tasks
and the number of covering tasks on the path that each vehicle for mobile crowdsensing,” IEEE Trans. Ind. Informat., vol. 18, no. 4,
uploads to the platform. Then, we proposed the MSBS problem pp. 2476–2485, Apr. 2022.
[14] J. Wang, X. Feng, T. Xu, H. Ning, and T. Qiu, “Blockchain-based model for
under budget constraints, which is proven to be NP-hard, and nondeterministic crowdsensing strategy with vehicular team cooperation,”
designed a high-efficiency task allocation algorithm SBCT that IEEE Internet Things J., vol. 7, no. 9, pp. 8090–8098, Sep. 2020.
can make the utmost of the budget of the platform to maxi- [15] Y. Liu, N. Chen, X. Zhang, X. Liu, Y. Yi, and N. Zhao, “Research on
mize satisfaction. We compared the proposed MADDPG and multi-task assignment model based on task similarity in crowdsensing,”
in Proc. IEEE Int. Conf. Commun., 2021, pp. 523–528.
SBCT incentive mechanism with the optimal results obtained [16] Y. Liu, H. Wang, M. Peng, J. Guan, and Y. Wang, “An incentive mechanism
by the BFS algorithm, rational algorithm, greedy algorithm, and for privacy-preserving crowdsensing via deep reinforcement learning,”
random algorithm. Abundant experimental results manifested IEEE Internet Things J., vol. 8, no. 10, pp. 8616–8631, May 2021.
that both the platform and vehicles learn strategies that could [17] Y. Pei, G. Zhang, F. Hou, and G. Yang, “Online optimal algorithm design
attain approximatively optimal returns. In future work, we will for mobile crowdsensing with dual-role users,” in Proc. IEEE Veh. Technol.
Conf., 2021, pp. 1–5.
further consider protecting user privacy and design more effec- [18] D. Deng, C. Shahabi, and U. Demiryurek, “Maximizing the number of
tive mechanisms to cope with highly dynamic environments and worker’s self-selected tasks in spatial crowdsourcing,” in Proc. ACM
meet stringent real-time requirements. Sigspatial Int. Conf. Adv. Geographic Inf. Syst., 2013, pp. 324–333.
[19] M. H. Cheung, F. Hou, and J. Huang, “Delay-sensitive mobile crowd-
sensing: Algorithm design and economics,” IEEE Trans. Mobile Comput.,
REFERENCES vol. 17, no. 12, pp. 2761–2774, Dec. 2018.
[20] W. Li, B. Jia, H. Xu, Z. Zong, and T. Watanabe, “A multi-task scheduling
[1] P. Arthurs, L. Gillam, P. Krause, N. Wang, K. Halder, and A. Mouzakitis, mechanism based on ACO for maximizing workers’ benefits in mobile
“A taxonomy and survey of edge cloud computing for intelligent trans- crowdsensing service markets with the Internet of Things,” IEEE Access,
portation systems and connected vehicles,” IEEE Trans. Intell. Transp. vol. 7, pp. 41463–41469, 2019.
Syst., vol. 23, no. 7, pp. 6206–6221, Jul. 2022. [21] X. Tao and W. Song, “Location-dependent task allocation for mobile
[2] R. Gao, F. Sun, W. Xing, D. Tao, J. Fang, and H. Chai, “CTTE: Customized crowdsensing with clustering effect,” IEEE Internet Things J., vol. 6, no. 1,
travel time estimation via mobile crowdsensing,” IEEE Trans. Intell. pp. 1029–1045, Feb. 2019.
Transp. Syst., vol. 23, no. 10, pp. 19335–19347, Oct. 2022. [22] G. Gao, H. Huang, M. Xiao, J. Wu, Y. -E. Sun, and Y. Du, “Bud-
[3] X. Zhu, Y. Luo, A. Liu, W. Tang, and M. Z. A. Bhuiyan, “A deep learning- geted unknown worker recruitment for heterogeneous crowdsensing using
based mobile crowdsensing scheme by predicting vehicle mobility,” IEEE CMAB,” IEEE Trans. Mobile Comput., vol. 21, no. 11, pp. 3895–3911,
Trans. Intell. Transp. Syst., vol. 22, no. 7, pp. 4648–4659, Jul. 2021. Nov. 2022.
[4] X. Liu, W. Chen, Y. Xia, and R. Shen, “TRAMS: A secure vehicular crowd- [23] G. Gao, M. Xiao, J. Wu, L. Huang, and C. Hu, “Truthful incentive mech-
sensing scheme based on multi-authority attribute-based signature,” IEEE anism for nondeterministic crowdsensing with vehicles,” IEEE Trans.
Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 12790–12800, Aug. 2022. Mobile Comput., vol. 17, no. 12, pp. 2982–2997, Dec. 2018.
[5] P. Mohan, V. N. Padmanabhan, and R. Ramjee, “Nericell: Rich monitoring [24] Y. Hui et al., “BCC: Blockchain-based collaborative crowdsensing in
of road and traffic conditions using mobile smartphones,” in Proc. ACM autonomous vehicular networks,” IEEE Internet Things J., vol. 9, no. 6,
SenSys, 2008, pp. 323–336. pp. 4518–4532, Mar. 2022.
[6] P. Dutta et al., “Common sense: Participatory urban sensing using a net- [25] K. Wei et al., “High-performance UAV crowdsensing: A deep rein-
work of handheld air quality monitors,” in Proc. 7th ACM Conf. Embedded forcement learning approach,” IEEE Internet Things J., vol. 9, no. 19,
Netw. Sensor Syst., 2009, pp. 349–350. pp. 18487–18499, Oct. 2022.
[7] I. Schweizer et al., “Noisemap: Multi-tier incentive mechanisms for par- [26] W. Tan, L. Zhao, B. Li, L. Xu, and Y. Yang, “Multiple cooperative task
ticipative urban sensing,” in Proc. 3rd Int. Workshop Sens. Appl. Mobile allocation in group-oriented social mobile crowdsensing,” IEEE Trans.
Phones, 2012, pp. 1–5. Serv. Comput., vol. 15, no. 6, pp. 3387–3401, Nov./Dec. 2022.
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.
538 IEEE SYSTEMS JOURNAL, VOL. 18, NO. 1, MARCH 2024
[27] B. Tang, Z. Li, and K. Han, “Multi-agent reinforcement learning for mobile Miao Ma received the M.E. degree in technology of
crowdsensing systems with dedicated vehicles on road networks,” in Proc. computer application from the Xi’an University of
IEEE Int. Intell. Transp. Syst. Conf., 2021, pp. 3584–3589. Science and Technology, Xi’an, China, in 2002, and
[28] L. Liu et al., “Evenness-aware data collection for edge-assisted mobile the Ph.D. degree in signal and information process-
crowdsensing in internet of vehicles,” IEEE Internet Things J., vol. 10, ing degree from Northwest Polytechnic University,
no. 1, pp. 1–16, Jan. 2023. Xi’an, in 2005.
[29] B. Cao, S. Xia, J. Han, and Y. Li, “A distributed game methodology As a Postdoctoral Researcher, she did research
for crowdsensing in uncertain wireless scenario,” IEEE Trans. Mobile work with Northwestern Polytechnical University,
Comput., vol. 19, no. 1, pp. 15–28, Jan. 2020. from 2006 to 2009. She is currently a Professor
[30] M. Li, M. Ma, L. Wang, B. Yang, T. Wang, and J. Sun, “Multitask- with the School of Computer Science, Shaanxi Nor-
oriented collaborative crowdsensing based on reinforcement learning and mal University, Xi’an. Her research interests in-
blockchain for intelligent transportation system,” IEEE Trans. Ind. Infor- clude image processing, video analysis on educational big data, and mobile
mat., vol. 19, no. 9, pp. 9503–9514, Sep. 2023. crowdsensing.
[31] H. Wang, Y. Yang, E. Wang, W. Liu, Y. Xu, and J. Wu, “Truthful user
recruitment for cooperative crowdsensing task: A combinatorial multi-
armed bandit approach,” IEEE Trans. Mobile Comput., vol. 22, no. 7,
pp. 4314–4331, Jul. 2023.
[32] H. Wang, C. H. Liu, H. Yang, G. Wang, and K. K. Leung, “Ensuring Liang Wang (Member, IEEE) received the B.S.
threshold AoI for UAV-assisted mobile crowdsensing by multi-agent deep degree in telecommunications engineering and the
reinforcement learning with transformer,” IEEE/ACM Trans. Netw., early Ph.D. degree in communication and information sys-
access, Jul. 12, 2023, doi: 10.1109/TNET.2023.3289172. tems from Xidian University, Xi’an, China, in 2009
[33] X. Dong, Z. You, T. H. Luan, Q. Yao, Y. Shen, and J. Ma, “Optimal mobile and 2015, respectively.
crowdsensing incentive under sensing inaccuracy,” IEEE Internet Things From 2018 to 2019, he was a Visiting Scholar
J., vol. 8, no. 10, pp. 8032–8043, May 2021. with the School of Electrical and Computer Engineer-
[34] K. Jiang et al., “A reinforcement learning-based incentive mech- ing, Georgia Institute of Technology, Atlanta, GA,
anism for task allocation under spatiotemporal crowdsensing,” USA. He is currently an Associate Professor with the
IEEE Trans. Comput. Social Syst., early access, Apr. 10, 2023, School of Computer Science, Shaanxi Normal Uni-
doi: 10.1109/TCSS.2023.3263821. versity, Xi’an. His research interests include Internet
[35] G. Ji, B. Zhang, G. Zhang, and C. Li, “Online incentive mechanisms for of Things, mobile edge computing, and applications of reinforcement learning.
socially-aware and socially-unaware mobile crowdsensing,” IEEE Trans.
Mobile Comput., to be published, doi: 10.1109/TMC.2023.3321701.
[36] C. Xu and W. Song, “Decentralized task assignment for mobile crowdsens-
ing with multi-agent deep reinforcement learning,” IEEE Internet Things
J., vol. 10, no. 18, pp. 16564–16578, Sep. 2023. Zhao Pei (Member, IEEE) received the B.E., M.S.,
[37] S. Huang et al., “Gather or scatter: Stackelberg game based task deci- and Ph.D. degrees from Northwestern Polytechnical
sion for blockchain-assisted socially-aware crowdsensing framework,” University, Xi’an, China, in 2005, 2008, and 2013,
IEEE Internet Things J., vol. 11, no. 2, pp. 1939–1951, Jan. 2024, respectively.
doi: 10.1109/JIOT.2023.3284477. From 2010 to 2011, he was a joint Ph.D. Stu-
[38] R. Ding, Y. Xu, F. Gao, and X. Shen, “Trajectory design and access dent with the Department of Computing Science,
control for air–ground coordinated communications system with multi- University of Alberta, Edmonton, AB, Canada. He
agent deep reinforcement learning,” IEEE Internet Things J., vol. 9, no. 8, is currently a Professor with the School of Com-
pp. 5785–5798, Apr. 2022. puter Science, Shaanxi Normal University, Xi’an.
[39] J. Hu et al., “Towards demand-driven dynamic incentive for mobile His research interests include camera array synthetic
crowdsensing systems,” IEEE Trans. Wireless Commun., vol. 19, no. 7, aperture imaging, object detection and tracking, and
pp. 4907–4918, Jul. 2020. human body motion analysis.
[40] H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems. Berlin
Germany: Springer, 2004.
[41] F. Li, X. Li, Y. Fu, P. Zhao, and S. Liu, “A secure and pri-
vacy preserving incentive mechainism for vehicular crowdsensing with Jie Ren (Member, IEEE) received the Ph.D. degree
data quality assurance,” in Proc. IEEE Veh. Technol. Conf., 2021,
in computer architecture from Northwest University,
pp. 1–5.
Xi’an, China, in 2017.
[42] F. Li, Y. Fu, P. Zhao, and C. Li, “An incentive mechanism for nondetermin-
He is currently an Assistant Professor with the
istic vehicular crowdsensing with blockchain,” in Proc. IEEE Int. Conf. Computer Science Department, Shaanxi Normal Uni-
Commun., 2020, pp. 1074–1079.
versity, Xi’an, China. His research interests include
[43] J. Yuan, Y. Zheng, C. Zhang, W. Xie, and Y. Huang, “T-drive: Driving
on mobile system optimization, runtime schedul-
directions based on taxi trajectories,” in Proc SIGSPATIAL Int. Conf. Adv. ing, and contrastive learning in natural language
Geographic Inf. Syst., 2010, pp. 99–108.
processing.
[44] B. Yin, J. Li, and X. Wei, “Rational task assignment and path plan-
ning based on location and task characteristics in mobile crowdsens-
ing,” IEEE Trans. Comput. Social Syst., vol. 9, no. 3, pp. 781–793,
Jun. 2022.
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on November 16,2024 at 10:36:42 UTC from IEEE Xplore. Restrictions apply.