Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
October 8, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3022755
ABSTRACT Autonomous driving is a promising technology to reduce traffic accidents and improve driving
efficiency. In this work, a deep reinforcement learning (DRL)-enabled decision-making policy is constructed
for autonomous vehicles to address the overtaking behaviors on the highway. First, a highway driving
environment is founded, wherein the ego vehicle aims to pass through the surrounding vehicles with an
efficient and safe maneuver. A hierarchical control framework is presented to control these vehicles, which
indicates the upper-level manages the driving decisions, and the lower-level cares about the supervision of
vehicle speed and acceleration. Then, the particular DRL method named dueling deep Q-network (DDQN)
algorithm is applied to derive the highway decision-making strategy. The exhaustive calculative procedures
of deep Q-network and DDQN algorithms are discussed and compared. Finally, a series of estimation
simulation experiments are conducted to evaluate the effectiveness of the proposed highway decision-
making policy. The advantages of the proposed framework in convergence rate and control performance are
illuminated. Simulation results reveal that the DDQN-based overtaking policy could accomplish highway
driving tasks efficiently and safely.
INDEX TERMS Autonomous driving, decision-making, deep reinforcement learning, dueling deep
Q-network, deep Q-learning, overtaking policy.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
177804 VOLUME 8, 2020
J. Liao et al.: Decision-Making Strategy on Highway for Autonomous Vehicles
FIGURE 1. The constructed deep reinforcement learning-enabled highway overtaking driving policy for autonomous vehicles.
process (POMDP) is used to construct the general decision- policy is able to promote traffic flow and driving comfort.
making framework [11], [12]. The authors in [13] developed However, the common DRL methods are unable to address
an advanced ability to make appropriate decisions in the the highway overtaking problems because of the continuous
city road traffic situations. The presented decision-making action space and large state space [23].
policy is multiple criteria, which helps the city cars make In this work, a DRL enabled highway overtaking driving
feasible choices in different conditions. In Ref. [14], policy is constructed for autonomous vehicles. The proposed
Nie et al. discussed the lane-changing decision-making strat- decision-making strategy is evaluated and estimated to be
egy for connected automated cars. The related model is adaptive to other complicated scenarios, as depicted in Fig. 1.
combining the cooperative car-following models and candi- First, the studied driving environment is founded on the high-
date decision generation module. Furthermore, the authors in way, wherein an ego vehicle aims to run through a particular
[15] mentioned the thought of a human-like driving system. driving scenario efficiently and safely. Then, a hierarchi-
It could adjust the driving decisions by considering the cal control structure is shown to manipulate the lateral and
driving demand for human drivers. longitudinal motions of the ego and surrounding vehicles.
Deep reinforcement learning (DRL) techniques are taken Furthermore, the special DRL algorithm called dueling deep
as a powerful tool to deal with the long sequential Q-network (DDQN) is derived and utilized to obtain the high-
decision-making problems [16]. In recent years, many way decision-making strategy. The DQL and DDQN algo-
attempts have been implemented to study DRL-based rithms are compared and analyzed theoretically. Finally, the
autonomous driving topics. For example, Duan et al. built performance of the proposed control framework is discussed
a hierarchical structure to learn the decision-making policy via executing a series of simulation experiments. Simulation
via the reinforcement learning (RL) method [17]. The pro results reveal that the DDQN-based overtaking policy could
of this work is independent of the historical labeled driving accomplish highway driving tasks efficiently and safely.
data. Ref. [18], [19] utilized DRL approaches to handle the The main contributions and innovations of this work can
collision avoidance and path following problems for auto- be cast into three perspectives: 1) an adaptive and optimal
mated vehicles. The relevant control performance is better DRL-based highway overtaking strategy is proposed for auto-
than the conventional RL methods in these two findings. mated vehicles; 2) the dueling deep Q-network (DDQN)
Furthermore, the authors in [20], [21] considered not only algorithm is leveraged to address the large state space of
path planning but also the fuel consumption for autonomous the decision-making problem; 3) the convergence rate and
vehicles. The related algorithm is deep Q-learning (DQL), control optimization of the derived decision-making policy
and it was proven to accomplish these two-driving missions are demonstrated by multiple designed experiments.
suitably. Han et al. employed the DQL algorithm to decide This following organization of this article is given as
the lane change or lane keep for connected autonomous cars, follows: the highway driving environment and the control
in which the information of the nearby vehicles is treated modules of the ego and surrounding vehicles are described
as feedback knowledge from the network [22]. The resulted in Section II. The DQL and DDQN algorithms are defined in
FIGURE 3. The hierarchical control framework discussed in this work for the ego vehicle and surrounding vehicles.
Since the IDM is utilized to determine the longitudinal the acceleration by a proportional controller as:
behavior, the MOBIL is employed to make the lateral lane
a = Kp (vtar − v) (5)
change decisions [27]. MOBIL states that lane-changing
behaviors should be observed by two restrictions, which are where Kp is the proportional gain.
safety criterion and incentive condition. These constraints In the lateral direction, the controller deals with the
are related to the ego vehicle e, the follower i (of the ego position and heading of the vehicle with a simple
vehicle) at current lane, and the follower j at the target lane proportional-derivative action. The position indicates the
of lane change. Assuming aold old
i and aj are the accelerations
lateral speed vlat of the vehicle is computed as follows:
of these followers before changing, and anewi and anew
j are the vlat = −Kp,lat 1lat (6)
accelerations after changing.
The safety criterion requires the follower in the desired lane where Kp,lat is named as position gain, 1lat is the lateral
(after changing) to limit its acceleration to avoid a collision. position of the vehicle with respect to the center-line of the
The mathematic expression is shown as: lane. Then, the heading control is related to the yaw rate
command ϕ as:
anew ≥ −bsafe (3)
j ϕ̇ = Kp,ϕ (ϕtar − ϕ) (7)
where bsafe is the maximum braking imposed to the follower where ϕtar is the target heading angle to follow the desired
in the lane-changing behavior. By following (3), collision and lane and Kp,lat is the heading gain.
accidents could be avoided effectively. Hence, the movements of the surrounding vehicles are
The incentive condition is imposed on the ego vehicle and achieved by the bi-level control framework in Fig. 3.
its followers by an acceleration threshold ath : The position, speed, and acceleration of these vehicles are
assumed to be known to the ego vehicle. This limitation
anew − aold j )] > ath (4)
new
e e + z[(ai − aold new
i ) + (aj − aold
propels the ego vehicle to learn how to drive in the scenario
where z is named as the politeness coefficient to determine the via the trial-and-error procedure. In the next section, the DRL
effect degree of the followers in the lane-changing behaviors. approach is introduced and established to realize this learning
This incentive condition means the desired lane should be process and derive the highway decision-making policy.
safer than the old lane. For application, the parameters in
III. DRL METHODOLOGY
MOBIL are defined as follows: the politeness factor z is
This section introduces the RL method and exhibits the
0.001, safe deceleration limit bsafe is 2 m/s2 , and acceleration
special DRL algorithms. The interaction in RL between
threshold ath is 0.2 m/s2 . After deciding the longitudinal and
the agent and the environment is first explained. Then,
lateral behaviors in the upper-level, the lower-level is applied
the DQL algorithm that incorporates the neural network and
to follow the target speed and lane.
Q-learning algorithm is formulated. Finally, a dueling net-
C. VEHICLE MOTION CONTROLLER work is constructed in a DQL algorithm to reconstitute the
In the lower-level, the motions of the vehicles in the longitudi- output layer of the neural network, and thus raise the DDQN
nal and lateral direction are controlled. The former regulates method.
FIGURE 4. The dueling network combined with state-value network and advantage network for Q table updating.
A(s, a) of each action. Therefore, the state-action function where v1 , v2 are the longitudinal and lateral speed of the
(Q table) is constituted as follows: vehicle, respectively, similar to the d1 and d2 . The policy
frequency is 1 Hz, which indicates the time interval 1t is
Qπ (s, a) = Aπ (s, a) + V π (s) (17)
1 second. It should be noticed that (23) and (24) are feasible
It is obvious that the output of this new dueling network is for the ego vehicle and surrounding vehicles simultaneously,
also a Q table, and thus the neural network used in DQN can and these expressions are considered as the transition model
also be employed to approximate this Q table. The network P in RL. Then, the state variables are defined as the relative
with two parameters is computed as: speed and distance between the ego and nearby cars:
Qπ (s, a; θ ) = V π (s; θ1 ) + Aπ (s, a; θ2 ) (18) ego
1dt = dt − dtsur
(25)
where θ1 and θ2 are the parameters of state-value function and
ego
1vt = vt − vt sur
(26)
advantage function, respectively.
To update the Q table in DDQN and achieve the optimal where the superscript ego and sur represent the ego vehicle
control action, (18) is reformulated as follows: and surrounding vehicles, respectively.
Finally, the reward model R is constituted by the optimal
Qπ (s, a; θ ) = V π (s; θ1 ) + (Aπ (s, a; θ2 ) − max Aπ (s, a0 ; θ2 )) control objectives, which are avoiding collision, running as
a0
(19) fast as possible, and trying the driving on lane 1 (L = 1).
To bring this insight to fruition, the instantaneous reward
a∗ = arg max Q(s, a0 ; θ ) = arg max A(s, a0 ; θ2 ) (20) function is defined as follows:
a0 a0
It can be decerned that the input-output interfaces in DDQN rt = −1 · collision − 0.1 ∗ (vtego − vmax 2
ego ) − 0.4 ∗ (L − 1)
2
and DQN are the same. Hence, the gradient descent in (16) is (27)
capable of being recycled to train the Q table in this work.
where collision ∈ {0, 1} and the goal of the DDQN-based
D. VARIABLES SPECIFICATION highway decision-making strategy is maximizing the cumu-
To derive the DDQN-based decision-making strategy, the lative rewards.
preliminaries are initialized as follows, and the calculative The proposed decision-making control policy is trained
procedure is easily transformed into an analogous driving and evaluated in the simulation environment based on the
environment. The control actions are the longitudinal and OpenAI gym Python toolkit [35]. The numbers of lanes and
lateral accelerations (a1 and a2 ) with the units m/s2 : surrounding vehicles are 3 and 30. The discount factor γ
a1 ∈ [−5, 5]m/s2 (21) and learning rate α are 0.8 and 0.2. The layers of the value
2 network and advantage network are both 128. The value of ε
a2 ∈ [−1, 1]m/s (22) decreases from 1 to 0.05 with the time step 6000. The training
It is noticed that when these two accelerations are zeros, episode in different DRL approaches are 2000. The next
the ego vehicle adopts an idling control. section discusses the effectiveness of the presented decision-
After obtaining the acceleration actions, the speed and making strategy for autonomous vehicles.
position of the vehicle can be computed as follows: IV. RESULTS AND EVALUATION
(
vt+1
1 = vt1 + a1 · 1t In this section, the proposed highway decision-making policy
(23)
vt+1
2 = vt2 + a2 · 1t is estimated by comparing it with the benchmark methods.
( These techniques are the reference model in Fig. 3 and
d1t = vt1 · 1t + 21 · a1 · 1t 2 the common DQN in Section III.B. The optimality is ana-
(24)
d2t = vt2 · 1t + 12 · a2 · 1t 2 lyzed by conducting a comparison of these three methods.
V. CONCLUSION
This paper discusses the highway decision-making problem
using the DRL technique. By applying the DDQN algorithm
in the designed driving environments, an efficient and safe
control framework is constructed. Depending on a series of
simulation experiments, the optimality, convergence rate, and
adaptability are demonstrated. In addition, the testing results
are analyzed, and the potentials of the presented method to be
applied in real-world environments are proven. Future work
includes the online applications of highway decision-making
by executing hardware-in-loop (HIL) experiments. Moreover,
the real-world collected highway database can be used to
FIGURE 13. Another representative testing driving condition: the ego
estimate the related overtaking strategy.
vehicle make a dangerous lane changing and a collision happens.
REFERENCES
the ego vehicle has to slow down sometimes to avoid a col- [1] A. Raj, J. A. Kumar, and P. Bansal, ‘‘A multicriteria decision mak-
ing approach to study barriers to the adoption of autonomous vehi-
lision. The ego vehicle also needs to change to other lanes to cles,’’ Transp. Res. Part A, Policy Pract., vol. 133, pp. 122–137,
realize the overtaking process. Without loss of generality, two Mar. 2020.
typical situations (two episodes, A and B points in Fig. 10) are [2] T. Liu, B. Tian, Y. Ai, L. Chen, F. Liu, and D. Cao, ‘‘Dynamic states
chosen to analyze the decision-making behaviors of the ego prediction in autonomous vehicles: Comparison of three different meth-
ods,’’ in Proc. IEEE Intell. Transp. Syst. Conf. (ITSC), Oct. 2019,
vehicle. pp. 3750–3755.
Fig. 12 depicts one driving situation that there are three [3] A. Rasouli and J. K. Tsotsos, ‘‘Autonomous vehicles that interact with
surrounding vehicles in front of the ego vehicle (the episode pedestrians: A survey of theory and practice,’’ IEEE Trans. Intell. Transp.
Syst., vol. 21, no. 3, pp. 900–918, Mar. 2020.
represented by A point). The ego vehicle has to execute the [4] C. Gkartzonikas and K. Gkritza, ‘‘What have we learned? A review
car-following maneuver for a long time and wait for the of stated preference and choice studies on autonomous vehicles,’’
opportunity the overtake them. As a consequence, the vehicle Transp. Res. Part C, Emerg. Technol., vol. 98, pp. 323–337,
Jan. 2019.
speed may not reach the maximum value, and the ego vehi- [5] C.-J. Hoel, K. Driggs-Campbell, K. Wolff, L. Laine, and
cle may not surpass all the surrounding vehicles before the M. J. Kochenderfer, ‘‘Combining planning and deep reinforcement
destination. Furthermore, an infrequent driving condition is learning in tactical decision making for autonomous driving,’’ IEEE
Trans. Intell. Vehicles, vol. 5, no. 2, pp. 294–305, Jun. 2020.
described in Fig. 13 (the episode represented by B point). [6] C. Yang, Y. Shi, L. Li, and X. Wang, ‘‘Efficient mode transition con-
The ego vehicle wants to achieve a risky lane-changing to trol for parallel hybrid electric vehicle with adaptive dual-loop control
obtain higher rewards. However, it cashed nearby vehicles framework,’’ IEEE Trans. Veh. Technol., vol. 69, no. 2, pp. 1519–1532,
Feb. 2020.
because the operation space is not enough. This situation may
[7] C.-J. Hoel, K. Wolff, and L. Laine, ‘‘Tactical decision-making in
not happen in the training process, and thus the ego vehicle autonomous driving by reinforcement learning with uncertainty estima-
could cause a collision. tion,’’ 2020, arXiv:2004.10439. [Online]. Available: http://arxiv.org/abs/
Based on the detailed analysis in Fig 12 and 13, it hints 2004.10439
[8] SAE On-Road Automated Vehicle Standards Committee, ‘‘Taxonomy and
us to spend more time training the mutable decision-making definitions for terms related to on-road motor vehicle automated driving
strategy. These results also remind us that the relevant control systems,’’ SAE Standard J., vol. 3016, pp. 1–16, 2014.
[9] Y. Qin, X. Tang, T. Jia, Z. Duan, J. Zhang, Y. Li, and L. Zheng, [29] T. Liu, X. Tang, H. Wang, H. Yu, and X. Hu, ‘‘Adaptive hierarchical energy
‘‘Noise and vibration suppression in hybrid electric vehicles: State of the management design for a plug-in hybrid electric vehicle,’’ IEEE Trans. Veh.
art and challenges,’’ Renew. Sustain. Energy Rev., vol. 124, May 2020, Technol., vol. 68, no. 12, pp. 11513–11522, Dec. 2019.
Art. no. 109782. [30] X. Hu, T. Liu, X. Qi, and M. Barth, ‘‘Reinforcement learning for hybrid
[10] P. Hart and A. Knoll, ‘‘Using counterfactual reasoning and and plug-in hybrid electric vehicle energy management: Recent advances
reinforcement learning for decision-making in autonomous driving,’’ and prospects,’’ IEEE Ind. Electron. Mag., vol. 13, no. 3, pp. 16–25,
2020, arXiv:2003.11919. [Online]. Available: http://arxiv.org/ Sep. 2019.
abs/2003.11919 [31] T. Liu, H. Yu, H. Guo, Y. Qin, and Y. Zou, ‘‘Online energy management for
[11] W. Song, G. Xiong, and H. Chen, ‘‘Intention-aware autonomous driving multimode plug-in hybrid electric vehicles,’’ IEEE Trans. Ind. Informat.,
decision-making in an uncontrolled intersection,’’ Math. Problems Eng., vol. 15, no. 7, pp. 4352–4361, Jul. 2019.
vol. 2016, pp. 1–15, Apr. 2016. [32] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic
[12] C. Yang, S. You, W. Wang, L. Li, and C. Xiang, ‘‘A stochastic predictive Programming. Hoboken, NJ, USA: Wiley, 2014.
energy management strategy for plug-in hybrid electric vehicles based on [33] R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd ed.
fast rolling optimization,’’ IEEE Trans. Ind. Electron., vol. 67, no. 11, Cambridge, MA, USA: MIT Press, 2018.
pp. 9659–9670, Nov. 2020, doi: 10.1109/TIE.2019.2955398. [34] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and
[13] A. Furda and L. Vlacic, ‘‘Enabling safe autonomous driving in real-world N. De Freitas, ‘‘Dueling network architectures for deep reinforcement
city traffic using multiple criteria decision making,’’ IEEE Intell. Transp. learning,’’ in Proc. ICML, Jun. 2016, pp. 1995–2003.
Syst. Mag., vol. 3, no. 1, pp. 4–17, Spring 2011. [35] L. Edouard. (2018). An Environment for Autonomous Driving Decision-
[14] J. Nie, J. Zhang, W. Ding, X. Wan, X. Chen, and B. Ran, ‘‘Decentralized Making. GitHub. [Online]. Available: https://github.com/eleurent/
cooperative lane-changing decision-making for connected autonomous highway-env
Vehicles∗ ,’’ IEEE Access, vol. 4, pp. 9413–9420, 2016.
[15] L. Li, K. Ota, and M. Dong, ‘‘Humanlike driving: Empirical decision-
making system for autonomous vehicles,’’ IEEE Trans. Veh. Technol.,
vol. 67, no. 8, pp. 6814–6823, Aug. 2018.
[16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,
M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, JIANGDONG LIAO received the M.S. degree
S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, in mathematics and computer engineering from
D. Wierstra, S. Legg, and D. Hassabis, ‘‘Human-level control through Chongqing Normal University. He works in
deep reinforcement learning,’’ Nature, vol. 518, no. 7540, pp. 529–533,
mathematics and statistics with Yangtze Normal
Feb. 2015.
University. His current research interests include
[17] J. Duan, S. Eben Li, Y. Guan, Q. Sun, and B. Cheng, ‘‘Hierarchical
reinforcement learning for self-driving decision-making without reliance driving behavior analysis, vehicle’s motion
on labelled driving data,’’ IET Intell. Transp. Syst., vol. 14, no. 5, prediction, and risk assessment of autonomous
pp. 297–305, May 2020. driving.
[18] M. Kim, S. Lee, J. Lim, J. Choi, and S. G. Kang, ‘‘Unexpected colli-
sion avoidance driving strategy using deep reinforcement learning,’’ IEEE
Access, vol. 8, pp. 17243–17252, 2020.
[19] Q. Zhang, J. Lin, Q. Sha, B. He, and G. Li, ‘‘Deep interactive reinforcement
learning for path following of autonomous underwater vehicle,’’ IEEE
Access, vol. 8, pp. 24258–24268, 2020.
[20] C. Chen, J. Jiang, N. Lv, and S. Li, ‘‘An intelligent path plan-
TENG LIU (Member, IEEE) received the B.S.
ning scheme of autonomous vehicles platoon using deep reinforcement
degree in mathematics and the Ph.D. degree in
learning on network edge,’’ IEEE Access, vol. 8, pp. 99059–99069,
2020. automotive engineering from the Beijing Institute
[21] C. Yang, M. Zha, W. Wang, K. Liu, and C. Xiang, ‘‘Efficient energy of Technology (BIT), Beijing, China, in 2011 and
management strategy for hybrid electric vehicles/plug-in hybrid elec- 2017, respectively. His Ph.D. Dissertation under
tric vehicles: Review and recent advances under intelligent transporta- the supervision of Prof. F. Sun was entitled Rein-
tion system,’’ IET Intell. Transp. Syst., vol. 14, no. 7, pp. 702–711, forcement Learning-Based Energy Management
Jul. 2020. for Hybrid Electric Vehicles.
[22] S. Han and F. Miao, ‘‘Behavior planning for connected autonomous He was a Research Fellow with Vehicle
vehicles using feedback deep reinforcement learning,’’ 2020, Intelligence Pioneers Ltd., from 2017 to 2018.
arXiv:2003.04371. [Online]. Available: http://arxiv.org/abs/ He was also a Postdoctoral Fellow with the Department of Mechani-
2003.04371 cal and Mechatronics Engineering, University of Waterloo, Canada, from
[23] S. Nageshrao, H. E. Tseng, and D. Filev, ‘‘Autonomous highway driving 2018 to 2020. He is currently a Professor with the Department of Auto-
using deep reinforcement learning,’’ in Proc. IEEE Int. Conf. Syst., Man
motive Engineering, Chongqing University, Chongqing, China. He has
Cybern. (SMC), Oct. 2019, pp. 2326–2331.
[24] T. Liu, B. Huang, Z. Deng, H. Wang, X. Tang, X. Wang, and D. Cao, more than eight year’s research and working experience in renewable vehi-
‘‘Heuristics-oriented overtaking decision making for autonomous vehicles cle and connected autonomous vehicle. He has published over 40 SCI
using reinforcement learning,’’ IET Elect. Syst. Transp., vol. 1, no. 99, articles and 15 conference papers in these areas. His current research
pp. 1–8, 2020. interests include reinforcement learning (RL)-based energy management
[25] M. Treiber, A. Hennecke, and D. Helbing, ‘‘Congested traffic states in hybrid electric vehicles, RL-based decision making for autonomous
in empirical observations and microscopic simulations,’’ Phys. Rev. E, vehicles, and CPSS-based parallel driving. He is a member of the IEEE
Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 62, no. 2, VTS, the IEEE ITS, the IEEE IES, the IEEE TEC, and the IEEE/CAA.
pp. 1805–1824, Aug. 2000. He received the Merit Student of Beijing in 2011, the Teli Xu Scholarship
[26] M. Zhou, X. Qu, and S. Jin, ‘‘On the impact of cooperative autonomous (Highest Honor) from the Beijing Institute of Technology in 2015, the Top
vehicles in improving freeway merging: A modified intelligent driver 10 from the IEEE VTS Motor Vehicle Challenge in 2018, and the Sole
model-based approach,’’ IEEE Trans. Intell. Transp. Syst., vol. 18, no. 6, Outstanding Winner from the ABB Intelligent Technology Competition
pp. 1422–1428, Jun. 2017. in 2018. He serves as the Workshop Co-Chair for the 2018 IEEE Intelligent
[27] A. Kesting, M. Treiber, and D. Helbing, ‘‘General lane-changing model
Vehicles Symposium (IV 2018). He serves as a Reviewer for multiple
MOBIL for car-following models,’’ Transp. Res. Rec., J. Transp. Res.
Board, vol. 1999, no. 1, pp. 86–94, Jan. 2007. SCI journals, including the IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS,
[28] T. Liu, X. Hu, W. Hu, and Y. Zou, ‘‘A heuristic planning reinforcement the IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, the IEEE TRANSACTIONS ON
learning-based energy management for power-split plug-in hybrid electric INTELLIGENT TRANSPORTATION SYSTEMS, the IEEE TRANSACTIONS ON SYSTEMS,
vehicles,’’ IEEE Trans. Ind. Informat., vol. 15, no. 12, pp. 6436–6445, MAN, AND CYBERNETICS: SYSTEMS, the IEEE TRANSACTIONS ON INDUSTRIAL
Dec. 2019. INFORMATICS, Advances in Mechanical Engineering.
XIAOLIN TANG (Member, IEEE) received the BING HUANG received the B.S. degree in
B.S. degree in mechanics engineering and the M.S. automotive engineering from Chongqing Univer-
degree in vehicle engineering from Chongqing sity, where he is currently pursuing the M.S.
University, Chongqing, China, in 2006 and 2009, degree. His current research interest includes
respectively, and the Ph.D. degree in mechani- decision-making for autonomous driving.
cal engineering from Shanghai Jiao Tong Uni-
versity, China, in 2015. From August 2017 to
August 2018, he was a Visiting Professor with
the Department of Mechanical and Mechatronics
Engineering, University of Waterloo, Waterloo,
ON, Canada. He is currently an Associate Professor with the Department
of Automotive Engineering, Chongqing University. He has led and involved DONGPU CAO received the Ph.D. degree from
with more than ten research projects, such as the National Natural Sci- Concordia University, Canada, in 2008. He is
ence Foundation of China. He has published more than 30 articles. His currently an Associate Professor and the Direc-
research interests include hybrid electric vehicles (HEVs), vehicle dynam- tor of the Driver Cognition and Automated Driv-
ics, noise and vibration, and transmission control. He is a Committeeman ing (DC-Auto) Laboratory, University of Water-
with the Technical Committee on Vehicle Control and Intelligence, Chinese loo, Canada. He has contributed more than
Association of Automation (CAA). 170 publications. He holds one U.S. Patent. His
research interests include vehicle dynamics and
control, driver cognition, and automated driv-
ing and parallel driving. He has been serv-
ing on the SAE International Vehicle Dynamics Standards Committee,
ASME, SAE, and the IEEE technical committees. He received the ASME
XINGYU MU received the B.S. degree in auto- AVTT’2010 Best Paper Award and the 2012 SAE Arch T. Colwell Merit
motive engineering from Chongqing University, Award. He serves as the Co-Chair for the IEEE ITSS Technical Com-
where he is currently pursuing the M.S. degree. mittee on Cooperative Driving. He serves as an Associate Editor for the
His current research interest includes left-turn IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, the IEEE TRANSACTIONS
decision-making problem of autonomous driving ON INTELLIGENT TRANSPORTATION SYSTEMS, the IEEE/ASME TRANSACTIONS ON
at intersections. MECHATRONICS, the IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, and the
Journal of Dynamic Systems, Measurement and Control (ASME). He serves
as a Guest Editor for Vehicle System Dynamics and the IEEE TRANSACTIONS
ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS.