Previous Article in Journal
A Terminal Residual Vibration Suppression Method of a Robot Based on Joint Trajectory Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Game-Theoretic Adversarial Interaction-Based Critical Scenario Generation for Autonomous Vehicles

1
Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China
2
University of Science and Technology of China, Hefei 230026, China
3
Anhui Engineering Laboratory for Intelligent Driving Technology and Application, Hefei 230031, China
4
Innovation Research Institute of Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Hefei 230031, China
*
Author to whom correspondence should be addressed.
Submission received: 13 June 2024 / Revised: 17 July 2024 / Accepted: 1 August 2024 / Published: 6 August 2024
(This article belongs to the Section Automation and Control Systems)

Abstract

:
Ensuring safety and efficiency in the rapidly advancing autonomous vehicle (AV) industry presents a significant engineering challenge. Comprehensive performance evaluations and critical scenario testing are essential for identifying situations that AVs cannot handle. Thus, generating critical scenarios is a key problem in AV testing system design. This paper proposes a game-theoretic adversarial interaction method to efficiently generate critical scenarios that challenge AV systems. Initial motion prediction for adversarial and surrounding vehicles is based on kinematic models and road constraints, establishing interaction action spaces to determine possible driving domains. A novel evaluation approach combines reachability sets with adversarial intensity to assess collision risks and adversarial strength for any state, used to solve behavior values for each interaction action state. Further, equilibrium action strategies for the vehicles are derived using Stackelberg game theory, yielding optimal actions considering adversarial interactions in the current traffic environment. Simulation results show that the adversarial scenarios generated by this method significantly increase incident rates by 158% to 1313% compared to natural driving scenarios, while ride comfort and driving efficiency decrease, and risk significantly increases. These findings provide critical insights for model improvement and demonstrate the proposed method’s suitability for assessing AV performance in dynamic traffic environments.

1. Introduction

1.1. Motivation

Autonomous vehicles (AVs) present a promising solution for enhancing traffic efficiency and safety [1,2]. However, recent incidents during both testing and actual deployment of AVs [3] have raised significant safety concerns. Therefore, ensuring the safe operation of AV systems in various complex and dynamic traffic scenarios is a critical prerequisite for advancing autonomous driving technology.
Comprehensive and rigorous testing is essential to guarantee the safety of AVs. Traditional testing methods typically depend on predefined scenarios and the expertise of human specialists. These methods are not only time-consuming and labor-intensive but also inadequate in covering all possible situations, especially extreme or rare scenarios. Hence, the efficient generation of critical scenarios that can robustly challenge AV systems has become a focal point of current research.

1.2. Related Work

In recent years, with the proliferation of deep generative models, evaluating autonomous vehicles (AVs) in artificially generated scenarios has become a mainstream approach [4]. Significant breakthroughs have been made in testing AVs in natural driving environments [5,6,7]; however, critical safety scenarios rarely occur in the real world. This makes it difficult to identify failure modes of AVs and improve models accordingly. Therefore, there is an urgent need for research on the generation of safety-critical scenarios.
According to the literature [8], methods for generating safety-critical scenarios can be classified into three categories: data-driven generation, knowledge-based generation, and adversarial generation.
Data-driven generation methods use information from collected datasets to generate scenarios. Since driving scenarios usually follow a distribution, collected data can be used to train density models to fit this distribution [9]. For instance, importance sampling theory is used in [10] to evaluate critical driving events, and the cross-entropy method is employed to find importance sampling functions within pre-designed parameter combinations. Huang et al. [11] also use importance sampling theory to generate critical interactions between different vehicles, accelerating intelligent driving tests, and establishing a piecewise mixed distribution model to simulate the maneuvers of leading vehicles.
Knowledge-based generation methods use predefined rules or explicit knowledge to guide the generation of critical scenarios. For example, in [12], prior knowledge is used to design random risk scenarios, and reinforcement learning agents are trained in various scenarios to achieve comparable results. In [13], explicit knowledge is encoded as first-order logic [14] and embedded into a tree structure to enable controlled generation. Reference [15] extracts various global road data from OpenStreetMap [16] and employs the SUMO [17] traffic simulator to generate safety-critical scenarios for these road networks.
Adversarial generation is another popular research topic, aimed at obtaining specific scenarios to evaluate the robustness of AVs [4,18]. The main purpose of adversarial generation is to degrade the performance of the target model or create safety risks. For example, ref. [19] proposes fusion and parallel attack models to attack AV perception algorithms, making intelligent driving systems dangerous. Xu et al. [20] use iterative projection gradient algorithms to attack semantic segmentation models. Kong et al. [21] propose using PhysGAN to generate physically robust adversarial examples that continuously mislead AVs. Additionally, ref. [22] adopts generative adversarial imitation learning models to generate adversarial driving behaviors. Reference [23] uses deep reinforcement learning to generate adversarial environments. Reference [24] proposes a scenario generation algorithm that frames the adversarial objective as a black-box attack, testing lidar-based autonomous systems by disrupting the trajectories of actors.
These methods can generate adversarial interaction scenarios to interfere with the tested AV systems and evaluate their performance. However, data-driven generation methods require a large number of prior accident scenarios, which are often lacking. Knowledge-based methods heavily depend on knowledge and experience, making it challenging to handle diverse and complex scenarios. Although adversarial methods can quickly generate numerous safety-critical scenarios, they primarily use machine learning and deep learning theories, requiring extensive data training. This process is inherently a black-box model, leading to many unreasonable adversarial behaviors.

1.3. Contributions

To address the dependency on original data and the issue of unreasonable adversarial behaviors in current adversarial scenario generation methods, this paper focuses on the behavior of adversarial vehicles (Advs) and proposes a game-theoretic method for generating critical scenarios for autonomous vehicles. The main contributions of this paper are as follows:
(1)
Construction of Game-Theoretic Adversarial Interaction Behaviors: By introducing game theory, a decision-making algorithm specifically targeting Advs is designed, make adversarial behaviors more reasonable and human-like, thereby avoiding unreasonable adversarial actions.
(2)
Quantitative Grading of Adversarial Intensity: Drawing on the concept of reachability sets, the adversarial intensity of scenarios is quantitatively graded to test AVs with different capabilities. By quantifying adversarial intensity, the performance of AV systems under different levels of adversarial challenges can be more accurately assessed.
(3)
Comprehensive Evaluation and Testing Efficiency Improvement: In addition to safety, the evaluation metrics include the comfort and driving performance of autonomous vehicles, providing a more comprehensive assessment system. The proposed method enables the intuitive generation of a large number of critical scenarios, significantly enhancing the testing efficiency of autonomous driving systems.

1.4. Structure of the Paper

The overall structure of this paper is as follows:
Section 2: Model Establishment and Preparation: This section provides a detailed introduction to the modeling methods for vehicle state sequences and trajectory predictions during multi-agent game interactions. It includes the construction of kinematic models for Adv motion trend predictions and the establishment of interaction action spaces.
Section 3: Establishment of Evaluation Functions: This section describes the design of various action payoff functions, including the evaluation of vehicle driving performance, the establishment of risk evaluation functions, and the design of a quantitative model for adversarial intensity.
Section 4: Solution of the Adversarial Interaction Decision-Making Model: Based on the evaluation functions established in Section 3, this section further considers the process of interaction games and derives the equilibrium actions for two vehicles using Stackelberg game theory.
Section 5: Experiments: This section verifies the rationality and effectiveness of the proposed game-theoretic adversarial interaction-based critical scenario generation method through the design of adversarial interaction scenarios. It includes experiments on adversarial interaction capabilities, validation of graded adversarial capabilities, and the generation of critical test scenarios.
Section 6: Conclusions: This section summarizes the main research work and findings of this paper and proposes directions for future research.

2. Model Establishment and Preparation

In the process of multi-agent interactions among AVs, vehicle state sequences and trajectory predictions are essential. To extract these profiles from the vehicle–road system, the preparation work mainly includes: Kinematic Modeling: This involves obtaining the transformation from action states (throttle, braking, and steering control) to vehicle trajectory states. Physics-Based Surrounding Vehicle Trajectory Prediction: This method reflects the future trajectory trends of various vehicles in the traffic scene. Construction of Interaction Action Spaces: This represents the possible driving domains of AVs within the decision-making time horizon, providing intent choices for the interactive game decision-making process among multiple AVs.

2.1. Construction of the Kinematic Model for Adversarial Vehicles

To derive the equations governing the changes in vehicle state based on control inputs, dynamic models and bicycle models have been widely used, as seen in [25,26]. For motion planning, a particle motion model is sufficient to generate smooth and continuous trajectories. The particle motion model is established in the Frenet coordinate system, as shown in Figure 1.
Typically, the trajectory of a moving vehicle can be expressed as a function of longitudinal and lateral positions ( s t , l t ) over time, from which states such as speed, acceleration, and heading angle can be derived. These are represented by the equations:
s t = f ( t ) l t = g ( t )
where f ( t ) and g ( t ) are functions of the longitudinal and lateral positions in the Frenet coordinate system, respectively. Due to the smoothness of the trajectory, the position functions can be expanded using the Taylor series. Additionally, as the errors are sufficiently small and continuously re-planned, higher-order terms are maintained as second-order, as shown in Equation (2).
s t + 1 l t + 1 = f ( t ) g ( t ) + f ˙ ( t ) g ˙ ( t ) T + f ¨ ( t ) g ¨ ( t ) T 2 2 !
where T is the planning period. Each of these can be expressed in terms of actual states as follows.
f ( t ) = s t ; g ( t ) = l t ; f ˙ ( t ) = v t c o s φ t ; g ˙ ( t ) = v t s i n φ t f ¨ ( t ) = a t τ c o s φ t a t n s i n φ t ; g ¨ ( t ) = a t τ s i n φ t + a t n c o s φ t
where v t is speed, φ t is heading angle, a t τ is longitudinal acceleration determined by the throttle and brake, and a t n is lateral acceleration determined by the steering, which defines the curvature of the path.
Thus, from the inputs of longitudinal and lateral accelerations ( a t τ , a t n ), the vehicle state s t + 1 , l t + 1 , v t + 1 , φ t + 1 can be updated as follows:
s t + 1 l t + 1 = s t l t + T v t c o s φ t v t s i n φ t + T 2 2 ! c o s φ t s i n φ t s i n φ t c o s φ t a t τ a t n v t + 1 φ t + 1 = v t φ t + T 1 0 0 1 / v t a t τ a t n
Additionally, the positions, speed, and heading angle s t , l t , v t , φ t in the Frenet coordinate system can be converted to the global coordinate system x t , y t , v t o , ϕ t based on the road reference information x t r , y t r , ϕ t r , k t r .
x t = x t r l t s i n ϕ t r y t = y t r + l t c o s ϕ t r v t o = v t c o s φ t 1 k t r l t 2 + v t s i n φ t 2 ϕ t = ϕ t r + φ t
where x t r , y t r , ϕ t r , k t r represent the longitudinal and lateral positions, heading angle, and curvature in the global coordinate system, respectively, as shown in Figure 1. This abstraction process enables the motion planning decisions to be transformed into specific sequences of longitudinal and lateral accelerations.

2.2. Motion Trend Prediction

In complex traffic scenarios, predicting a vehicle’s future trajectory is influenced not only by its own motion constraints but also by the driver’s intent, making accurate trajectory prediction challenging. This section first uses a motion prediction model to anticipate the motion trends of surrounding vehicles, then combines these with game theory to infer specific driving intentions.
When predicting the future trajectories of surrounding vehicles, the main information from sensors such as radar, lidar, and cameras includes:
Δ s t , Δ l t , v t s , a t s , Δ φ t , w t s , α t s
where Δ s t and Δ l t are the relative positions, v t s and a t s are the speed and acceleration of the surrounding vehicle, respectively, Δ φ t is the relative heading angle, w t s is the yaw rate, and α t s is the lateral acceleration of the surrounding vehicle.
Therefore, based on the sensor data, the future state of the surrounding vehicle in the Frenet coordinate system can be expressed as:
s t s = s t + Δ s t ; l t s = l t + Δ l t ; φ t s = φ t + Δ φ t a t s τ = a t s ; a t s n = w t s v t s
where ( s t s , l t s ) are the future positions of the surrounding vehicle (SV). φ t s is the future heading angle, and ( a t s τ , a t s n ) are the longitudinal and lateral accelerations of the SV, respectively.
Due to the inherent uncertainties in these conditions, directly predicting the trajectory based on sensor information is challenging. Therefore, this paper separately predicts the longitudinal and lateral trajectories and then combines them to form the overall trajectory.
Typically, the motion of a vehicle stabilizes after a single driving maneuver. At the endpoint t + N p , the lateral velocity and acceleration tend to zero. The initial position, lateral position, speed, and acceleration are subject to constraints. The five boundary conditions at the initial and endpoint can be used to fit a fourth-degree polynomial for the lateral trajectory as follows:
l t = a 0 + a 1 t + a 2 t 2 + a 3 t 3 + a 4 t 4   s . t .   l t = l t s , l ˙ t = v t s s i n φ t s , l ¨ t = a t s _ τ s i n φ t s + a t s _ n c o s φ t s l ˙ t + N p = 0 , l ¨ t + N p = 0
where a i ( i = 0 ~ 4 ) are the fitting coefficients of the fourth-degree polynomial used to fit the lateral motion trajectory.
Similarly, the longitudinal trajectory can also be represented by a polynomial. The initial position constraints are the same. At t + N p , only the acceleration is constrained:
s t = b 0 + b 1 t + b 2 t 2 + b 3 t 3   s . t .   s t = s t s , s ˙ t = v t s c o s φ t s , s ¨ t = a t s _ τ c o s φ t s a t s _ n s i n φ t s s ¨ t + N p = 0
where b i ( i = 0 ~ 3 ) are the polynomial coefficients representing the longitudinal trajectory.
The longitudinal and lateral trajectories predicted based on the kinematic model are obtained separately. Since the path and speed are planned in a coupled manner, the longitudinal and lateral motion trajectories need to be coupled.
For the interval [ t , t + N p ] , the states can be coupled as follows using functions s and l .
v t + i s = v t + i s s 2 + v t + i s l 2 φ t + i s = a r c t a n v t + i s l / v t + i s s a t + i s τ = a t + i s l s i n φ t + i s + a t + i s s c o s φ t + i s a t + i s n = a t + i s l c o s φ t + i s a t + i s s s i n φ t + i s
where v t + i s s = s ˙ t + i ; v t + i s l = l ˙ t + i ; a t + i s s = s ¨ t ; a t + i s l = l ¨ t .
Thus, the perceived state can derive the local trajectories of surrounding vehicles and the ego vehicle. The prediction results for three positions are shown in Figure 2. The predicted trajectories reflect the current motion trends.

2.3. Construction of Interaction Action Spaces

After roughly predicting the trajectories of the surrounding vehicles using the motion trend prediction model, it is necessary to represent the potential driving regions of both the vehicle and surrounding vehicles within the decision-making horizon to facilitate further interaction.
Typically, human drivers’ behaviors include acceleration, deceleration, lane change (left or right), and maintaining current behavior. As shown in Figure 3, the motion space starts with five basic action sequences. Here, B 1 is the acceleration sequence, B 2 and B 4 are the lane change sequences (left and right, respectively), B 3 is the deceleration sequence, and B 5 is the sequence maintaining the current behavior. B 5 is based on the polynomial model described in Equations (8)–(10), representing the corresponding endpoints for maintaining the current driving intention.
Taking B 5 S t + N p , L t + N p as the baseline, the endpoints for the acceleration, deceleration, and lane change actions can be derived for B 1 , B 3 , and B 2 , B 4 as follows:
B 1 B 5 = s t + v t N p + a m a x N p 2 / 2 S t + N p B 3 B 5 = S t + N p s t + v t N p a m i n N p 2 / 2 B 2 B 5 = l u p l t + N p B 4 B 5 = l t + N p l d o w n
where N p is the planning time horizon, a m a x is the maximum acceleration (approximately 2   m / s 2 ), and a m i n is the deceleration (approximately 3   m / s 2 ). For lane-changing actions, the target points will always be on the center of the lane. l u p and l d o w n are the positions of the upper and lower lane boundaries, respectively.
For AVs, a macro path is usually predetermined. AVs follow this macro path for local trajectory planning, using Frenet coordinates and polynomial motion predictions similar to Equation (11) to generate five target points. Based on these predicted trajectories, actions leading to road departure or collisions are filtered out, resulting in the feasible action space A = A 1 , A 3 , A 4 , A 5 . These feasible actions are then transformed into the global coordinate system for evaluation to determine the optimal trajectory.

3. Establishment of Evaluation Functions

The previous section predicted the future trajectories of vehicles and constructed the interaction action spaces between the AV and surrounding vehicles. This section will evaluate the benefits of the corresponding actions in the interaction action spaces.

3.1. Design of Action Payoff Functions

When AVs take actions in the interaction space, an immediate payoff R t is generated. The payoff function is designed to relate to the driving performance D P t and the adversarial penalty A P t between the AV and surrounding vehicles:
R t = D P t + A P t D P t = F t + E t C t
Common driving performance metrics D P t include safety, efficiency, and comfort. Here, F t represents the safety measure against surrounding vehicles; E t denotes the efficiency, represented by the speed deviation v t v p between the AV’s speed v t and the desired speed v p ; C t is the cost associated with taking the action.
For the comfort cost C t , lane-changing actions generally have higher costs, while acceleration and deceleration actions have relatively lower costs. The specific cost can be expressed as the deviation of the action’s endpoint from the reference action’s endpoint:
C t = λ 1 x d x t + N p + λ 2 y d y t + N p
where x d , y d are the trajectory endpoints corresponding to the decision action; x t + N p , y t + N p are the predicted trajectory endpoints; λ 1 and λ 2 are the cost coefficients for acceleration/deceleration and lane-changing actions, respectively.

3.2. Risk Assessment Functions

Evaluating risk in traffic scenarios involves multiple vehicle behaviors and varying conditions. To account for position, posture, and speed, this paper introduces the road risk assessment function F m to evaluate risks between vehicles. Combined with the static road risk function F s , a comprehensive risk assessment of the AV is performed.
Firstly, risks based on relative positions and speeds can be calculated using the safety distance model. As shown in Figure 4, r e is the distance from the AV’s front to the edge of the vehicle ahead, and D s is the safe distance between the two vehicles, expressed as:
D s = v e 2 v a 2 2 a m i n + v e ς + G
where v e is the current speed of the AV, v a is the speed of the Adv, a m i n is the deceleration, G is the minimum distance between the two vehicles when stopped, and ς is the response time of the AV. v e > v a indicates that the AV requires a longer braking distance to avoid colliding with the vehicle ahead, thus increasing risk. This is because, given the higher speed of the AV, braking must commence sooner. When v e v a , the AV’s speed is lower, and the safe distance between vehicles decreases, hence the risk is lower. The safe distance D s relative to the separation r e expresses the risk level between the vehicles.
When vehicles are in the same direction but not in the same lane, the primary risk comes from lateral displacement. Thus, the risk from surrounding vehicles to the AV is mainly due to the AV’s speed v e and its lateral deviation relative to the adversarial vehicle’s speed v a , expressed as:
D s = v a 2 v e c o s Δ φ 2 s g n ( c o s Δ φ ) 2 a m i n + v a ς + G         = k 1 v a 2 v e c o s Δ φ 2 s g n ( c o s Δ φ ) + b 1
where Δ φ s the lateral deviation between the AV and the adversarial vehicle, Δ φ = φ e φ a ; s g n ( · ) is the sign function, and s g n ( c o s   Δ φ ) determines whether the speeds are aligned. Constants k 1 and b 1 represent the safety factor and distance constant, respectively. Assuming the vehicle speed in urban areas is 16   m / s , with a deceleration a m i n = 5   m / s 2 and a reaction time ς of 0.8   s , and the stopping distance G = 4   m , the safety factor can be derived as k 1 = 0.1 , and the distance constant as b 1 = 17 .
The risk assessment function F m for the AV and adversarial vehicle in the same direction is based on their safety distance. When the relative distance r e exceeds the safe distance D s , the vehicles are considered safe with zero risk. As the distance decreases, the risk increases, influenced by road conditions, and is expressed as:
F m = p o s l i n R a D s r e 1 = 0 r e R a D s R a D s r e 1 other
The function p o s l i n ( ) is a positive linear function, retaining only positive values. R a is the relative risk factor, which increases with worsening road conditions. The specific values are referenced from [27], as shown in Table 1.
Additionally, when the vehicles are not in the same direction, the distance r e is extended to include the elliptical safety field, as shown in Equation (17):
F m = p o s l i n R a k 1 v a 2 v e c o s Δ φ 2 s g n ( c o s Δ φ ) s g n c o s θ a + b 1 r e 1
where θ a is the angle between the distance r e and the surrounding vehicle’s speed direction; s g n c o s θ a determines if the surrounding vehicle is in front of or behind the AV.
For the distance vector r e , collisions typically occur laterally, making it reasonable to add a relaxation factor ξ to form an elliptical safety field. When the sensor is at the AV’s center, the distance vector is expressed as:
r e = x a x e l e / 2 , ξ y a y e w e / 2
where ξ is the relaxation factor determining the horizontal spread of the elliptical field. typically set to 2. l e and w e are the length and width of the AV, respectively, with the sensor at the vehicle’s center.
Additionally, when multiple vehicles interact with the AV, the risk posed by the AV is defined as the maximum of the individual risks:
F m = m a x F 1 m , F 2 m , , F k m
where F i m ( i = 1 ~ k ) represents the risk posed by each surrounding vehicle. The highest risk indicates the most dangerous interacting vehicle.
The risk corresponding to this function can be visualized, as shown in Figure 5, illustrating the collision risk posed by surrounding vehicles to the AV. Similarly, the risk posed by the AV to surrounding vehicles can be calculated using this method.
Risk assessment must consider both the interactions with surrounding vehicles and the static road regions. This includes assessing the road centerline, lane positions, and road boundaries. Safe positions, where the AV can drive without violating road regulations, are denoted by set B . The risk at any road position g B is represented by U r , given by:
U r ( g ) = a r b r i = 1 M lane     e x p g x center , i T Σ r 1 g x center , i
where a r and b r are coefficients, x center , i is the lane center position, and Σ r is the covariance matrix.
When a vehicle deviates from the road g B , collision risk increases significantly. For off-road areas, a high-risk factor is applied. The comprehensive road risk function, integrating on-road and off-road risks, is defined as:
F s = U r ( g ) g B c r η r g x edge   2 + 1 + τ r g B
where c r is a scaling factor used to adjust the maximum risk cost when the vehicle deviates from the road, η r adjusts boundary risk, τ r is the boundary adjustment parameter, and x edge is the road edge coordinate.
To better visualize road risks, Figure 6 shows the results of the road risk construction. This figure illustrates the risk distribution on a three-lane road. Risk levels are indicated by colors: yellow for high risk, green for medium risk, and blue for low risk.
In summary, the overall risk F is derived from the road risk assessment function F m and the static road risk function F s :
F = α 1 F m + α 2 F s

3.3. Quantitative Model of Adversarial Intensity

In the development of autonomous driving systems, calculating the reachable set is crucial for ensuring safety and efficiency. The reachable set encompasses all possible states a vehicle can achieve within a certain time frame from a given initial state and control inputs. This section employs a fast marching method [28] to calculate the reachable set under various conditions: without obstacles, and considering obstacles and future trajectories of surrounding vehicles. The ratio of the online reachable set to the offline reachable set represents the adversarial intensity faced by the AV in different traffic scenarios.
The mathematical definition of the reachable set is as follows:
r e a c h t X 0 , U , F ( t ) : = x t ; u ( ) , x 0 x 0 X 0 , τ t 0 , t h : u ( τ ) U , p r o j x τ ; u ( ) , x 0 F ( τ )
x t ; u ( ) , x 0 = x 0 + t 0 t   f ( x ( τ ) , u ( τ ) ) d τ
where x t ; u ( ) , x 0 R n represents the vehicle state at time t , and u ( ) represents the control inputs of the vehicle. According to the kinematic model of adversarial vehicle motion from Section 2.1, u ( t ) = ( a t τ , a t n ) U , where U is the set of all possible control inputs. f ( x , u ) is the dynamic model from Equation (4). The state x ( t ) consists of multiple variables s t , l t , v t , φ t , x R n , which need to be projected into a two-dimensional space, so we define the projection operator p r o j ( x ) : R n R 2 . From the perspective of the vehicle’s safety, F ( τ ) represents road obstacles and forbidden areas. Therefore, Equation (23) defines the reachable set starting from the initial state x 0 over the time interval t t 0 , t h , avoiding a set of forbidden states F ( t ) .
From Equation (23), we know the reachable set evolves over time and is calculated incrementally. Let t k = k Δ t , k N is an integer. Thus, the evolution of the reachable set over time can be expressed as:
R k + 1 : = r e a c h t k + 1 R k , U , F ( t )
To discretize the reachable set, we partition the position space into a uniform grid of non-overlapping cells aligned with the coordinate axes:
C k ( i ) = c _ ( i ) , c ( i ) R 2 ,   i = 0 n c   C k ( i ) p r o j R k
where i refers to the i - t h cell. We construct a directed graph G = ( V , E ) , where nodes v k , i V correspond to cells in the reachable region C k ( i ) at time step k , An edge v k , i , v k + 1 , j E represents the trajectory from one cell to another in the next time step.
The calculation diagram of the offline reachable set and the online reachable set is shown in Figure 7.
Calculation of Offline Reachable Sets
Offline reachable sets are calculated without considering obstacles or other vehicles. This represents all possible states a vehicle can theoretically reach without any interference. Let R ˆ k denote the offline reachable set at time k.
The calculation of offline reachable sets includes propagation, discretization, and graph construction.
Propagation: Based on vehicle dynamics, the longitudinal and lateral accelerations a t τ [ a t τ _ , a t τ ¯ ] and a t n [ a t n _ , a t n ¯ ] form the control input set U . a _ and a ¯ represent the lower and upper bounds of the input control quantity. Thus, we can compute the reachable set r e a c h t k ( 0 , U , ) for the initial state at the origin. For further propagation, the reachable set R ˆ k for any initial state x 0 is:
R ˆ k = r e a c h t k x 0 , U , = r e a c h t k x 0 , 0 , r e a c h t k ( 0 , U , ) R k
where the symbol represents the Minkowski sum. Since the offline reachable set does not exclude any forbidden states, the propagation step is given by: R ˆ k + 1 = r e a c h t k R ˆ k , U , , F ( t ) = . Standard reachability analysis tools such as CORA [29], Flow* [30], ARCH-COMP23 [31], and SpaceEx [32] can be used for this computation.
Discretization and Graph Construction: For ease of computation and storage, the continuous reachable set is discretized into a grid or cells, each representing a possible state region. The reachable set R ˆ k is divided into non-overlapping subsets R ˆ k = { R k ( 1 ) , , R k ( i ) } . Using Equation (26), we compute the propagation of each cell to obtain:
i I : R ˆ k + 1 ( i ) = C ( i ) × s ˙ k ( i ) _ , s ˙ k ( i ) ¯ × l ˙ k ( i ) _ , l ˙ k ( i ) ¯ , I = i R ˆ k + 1 C ( i )
where s ˙ k ( i ) _ , s ˙ k ( i ) ¯ and l ˙ k ( i ) _ , l ˙ k ( i ) ¯ represent the longitudinal and lateral speed ranges in the Frenet coordinate system, respectively, limiting the reachable set R ˆ k to these boundaries. R k ( i ) is the prerequisite for obtaining a node in graph G .
After obtaining R k ( i ) , the reachable set can be represented in a discretized form. We introduce the discretization operator:
D ˆ k = d i s c r R ˆ k = i = 0 n b   R ˆ k ( i ) R ˆ k
Similarly, D k = d i s c r R k , and the condition for adding edges v k , i , v k + 1 , j in graph G is:
r e a c h t k + 1 R k ( i ) , U , R k + 1 ( j ) .
To efficiently implement this process, we use an adjacency matrix P k k + 1 R q k × q k + 1 to represent edges, where q k is the number of grid cells occupied at time k . Each element p j i of P k k + 1 is binary, defined as:
p j i = 1   if   r e a c h t k + 1 R k ( i ) , U , R k + 1 ( j ) 0   otherwise  
The adjacency matrix P k k + 1 is the main result of the reachability calculation and can be efficiently stored as a sparse matrix. It represents the edges in the directed graph G , indicating the reachable path from one cell to the next over time.
The overall algorithm process is shown in Algorithm 1.
Algorithm 1 Offline Reachability Analysis
Require:   input   set   U
   1: for k = 0 to h do
    2 :           R ˆ k + 1 0     r e a c h t k + 1 R ˆ k 0 , U ,
    3 :             D ˆ k + 1     d i s c r R ˆ k + 1 0                                                 ▷ see (29)
    4 :           v k , i , v k + 1 , j     REACHABLECELLS   ( D ˆ k , D ˆ k + 1 )         ▷ see (30)
    5 :             P k k + 1     CONSTRUCTGRAPH   { ( R k ( i ) , R k + 1 ( i ) ) }           ▷ see (31)
   6: end for
   7: return   P k k + 1 ,   D ˆ k
Calculation of Online Reachable Sets
The goal of online calculation is to exclude obstacles and other vehicles’ predicted trajectories from the reachable region F ( k ) . Therefore, during each propagation step, cells that intersect with F ( k ) are excluded from the reachable set.
We use the adjacency matrix P k k + 1 to propagate the drivable region and introduce a binary variable r k { 0,1 } q to indicate whether each cell belongs to the drivable region:
r k ( i ) = 1   if   x t k ; 0 , x 0 C k ( i ) p r o j D k 0 otherwise .  
Combining with Equation (31), we can write the propagation as:
r ˆ k + 1 = P k k + 1 r k
To exclude F ( k + 1 ) from the binary representation r ˆ k + 1 , we need to discretize F ( k + 1 ) . In motion planning, the vehicle’s shape is often approximated by a bounding semi-disk ρ to represent obstacle occupancy, allowing for efficient collision checking. Similar to r k , we introduce the binary variable o k + 1 to indicate occupancy:
o k + 1 ( i ) = 1 if x t k ; 0 , x 0 C k + 1 ( i ) F k + 1 0 otherwise .
where F k + 1 is typically non-contiguous, so multiple obstacles need to be discretized. For efficiency, only obstacles intersecting the drivable region’s boundary are discretized. Combining Equations (32) and (34), we exclude the occupancy states:
r k + 1 = r ˆ k + 1 ¬ o k + 1
Logical operations ¬ and are executed element-wise. Thus, we can express the propagation and exclusion of the reachable set as:
D k + 1 = d i s c r r e a c h t k + 1 D k , U , d i s c r F t k + 1 , D 0 = x 0
Combining Equations (33) and (35), the simplified exclusion occupancy state is:
r k + 1 = P k k + 1 r k ¬ o k + 1
The overall algorithm process is shown in Algorithm 2.
Algorithm 2 Online Reachability Analysis
Require:   graph   represented   by   propagation   matrices   P k k + 1 , forbidden set F ( k ) , number of time steps h ,   Discrete   sets   of   offline   reachable   sets   D k
   1: for k = 0 to h do
    2 :           x t k ; 0 , x 0     HOMOGENEOUSSOLUTION x 0
    3 :           r ˆ k + 1     IFINAREA P k k + 1 , D k , C k , x t k ; 0 , x 0                   ▷ see (33)
    3 :           o k + 1     OCCUPANCYGRI F k + 1 , C k + 1 , x t k ; 0 , x 0          ▷ see (34)
    4 :           r k + 1     EXCULDEICS   ( r ˆ k + 1 , o k + 1 )                                 ▷ see (35)
    5 :           D k + 1     D I S C R { ( D k , r k + 1 , F t k + 1 ) }                               ▷ see (36)
   6: end for
   7: return   r k + 1 ,   D k + 1
Evaluation of Adversarial Intensity
By comparing the online reachable set D ˆ k and the offline reachable set D k , we can calculate the adversarial intensity index A P k . The ratio of the online to offline reachable sets indicates the adversarial pressure on the AV, as shown in Figure 8. The expected adversarial intensity λ A P is set to 0.2, 0.4, and 0.6 for low, medium, and high intensities. A P k quantifies the deviation from the expected intensity:
A P k = λ A P ω ( k ) D k D ˆ k
where D k and D ˆ k are the sizes of the online and offline reachable sets, respectively, λ A P is the expected intensity, and ω ( k ) is a dynamic weighting factor.
In summary, the algorithm flow of action payoff function is solved as follows Algorithm 3:
Algorithm 3 Calculate Immediate payoffs
Input: Current   state   x t   includes   ego -   and   adv - vehicles ,   control   input   u ( ) ,   expected   speed   v p ,   parameters   α 1 , α 2 , λ A P , λ 1 , λ 2
Output:   Immediate   payoff   R t
    1 :   Initialize   payoff   R t = 0
    2 :   C a l c u l a t e s   a f e t y   p e r f o r m a n c e   F t
           2.1   C a l c u l a t e   d y n a m i c   d a n g e r   F m   b a s e d   o n   r e l a t i v e   d i s t a n c e   a n d   s p e e d
                ▷ see (17)
           2.2   C a l c u l a t e s   t a t i c r   o a d r   i s k F s
                ▷ see (21)
           2.3   C o m b i n e   t o   g e t   o v e r a l l   d a n g e r   F = α 1 F m + α 2 F s
   3: Calculate   efficiency   performance   E t
                                                                      E t = v t v p
   4: Calculate   comfort   cost   C t
                                                                      C t = λ 1 x d x t + N p + λ 2 y d y t + N p
   5: Calculate   vehicle   performance   D P t
                                                                      D P t = F t + E t C t
   6: Calculate   adversarial   intensity   A P t
                                                                      A P k = λ A P ω ( k ) D k D ˆ k
         ▷ see (Algorithm 1 and Algorithm 2)
   7: Calculate   immediate   payoff   R t
                                                                             R t = D P t + A P t
   8: return   R t

4. Solution of the Adversarial Interaction Decision-Making Model

Based on Section 2 and Section 3, we can determine the interaction space ( A , B ) and the corresponding action payoff R t . This section further discusses the game-theoretic process and derives the equilibrium actions using the Stackelberg game model.

4.1. Determining Leader and Follower Vehicles

In normal traffic scenarios, to ensure safety and efficiency, vehicles usually yield to the vehicle ahead. Here, the leading vehicle and the following vehicle establish a leader–follower relationship. The leading vehicle sets the base for interaction, maximizing its driving advantages. The process can be described using the Stackelberg game model.
Specifically, we need to identify the leading and following vehicles. Typically, we judge this based on their positions in the lane. For instance, in a scenario where both vehicles maintain their lane, the one ahead is the leader, and the one behind is the follower. This can be determined by the relative distance between the AV and surrounding vehicles, as shown in Figure 9.
First, the time T T F for the surrounding vehicle to reach the front of the AV is given by:
T T F = Δ y w e / 2 v a s i n φ x
where w e is the width of the AV, accounting for the relative distance between the AV and the surrounding vehicle on both sides. Based on this time, the relative distance in the x-direction is given by:
r e _ x , = r e _ x + v a c o s φ x v e T T F
where r e _ x is the relative distance component in the x-direction.
Thus, the leader–follower status of the AV can be determined based on the relative distance in the x-direction:
E g o V e h i c l e = master       r e _ x , 0     slave           r e _ x , < 0

4.2. Solving for Nash Equilibrium

The evaluation function established in Section 3 can assess the driving ability of the AV and its adversarial intensity with surrounding vehicles at a specific time. However, the specific solution requires a sequence of actions ( A 1 ,   B 1 ) to be evaluated comprehensively over time, rather than at a single moment. According to the Markov decision process, the state–action value function Q ( s t , A 1 ,   B 1 ) can be derived by the cumulative discounted payoff, defined as:
Q s t , A 1 , B 1 = i = 0 n   γ i R t + i + 1
where γ is the discount factor, and R t + i + 1 is the immediate payoff at time t + i + 1 . In this paper, n   = 5, meaning the value function is computed over 5 discrete states for the action sequence.
The AV and Adv adopt actions A 5 and B 2 , respectively, indicating the state–action sequences with a medium expected adversarial intensity. Using Equation (12) to solve the immediate payoff function at each time step, we derive the value function map, showing collision risk and adversarial intensity.
From Figure 10, it is clear that the Adv’s lane-changing behavior increases the risk for the AV. Thus, the AV must take proactive actions (e.g., deceleration) to avoid risks. Similarly, the Adv uses its state–action value function to decide actions like deceleration or lane change based on the payoff function.
First, we need to solve the response function of the follower vehicle to the leader vehicle. In the scenario shown in Figure 10, the AV is the follower, and Adv is the leader. For the leader vehicle, a set of acceleration and turning actions B i is given, and the follower vehicle will derive the optimal response action A ˆ based on this set. The specific action is defined as:
A ˆ = r B i r B i ζ A : Q t 1 s t , ζ , B i Q t 1 s t , A j , B i , A j A
where r is the optimal response function of the follower vehicle, and Q t 1 is the state–action value function of the follower. The smaller Q t 1 is, the better the action A .
To determine the leader’s optimal action, it selects the action that minimizes its own state–action value function Q t 2 :
B = a r g m i n m a x Q t 2 s t , A j , ζ , A j r B i , ζ B
where B * is the leader’s optimal action from its action space. Using Equation (44), the follower’s optimal response to B can be found. This solution A * , B * represents the Stackelberg equilibrium.
For the scenario shown in Figure 10, the state–action value function matrix for both vehicles can be derived, as illustrated in Table 2. The optimal actions for both the AV and the Adv can be identified from this matrix, representing the equilibrium. The same process can be applied to more vehicles by considering the most challenging and dangerous scenarios for evaluation.

5. Experiments

To verify the proposed game-theoretic adversarial interaction-based critical scenario generation method for AVs, this section includes three main experiments.
First, we designed a single adversarial vehicle lane-change scenario to demonstrate the rationality, human-like decision-making, and interpretability of our algorithm. Second, we designed two scenarios to show that our algorithm enables adversarial vehicles to exhibit different levels of adversarial behavior. Third, in the Carla simulator, we tested two autonomous driving systems: an RRT (Rapidly exploring Random Tree)-based path-planning system [33] and a DDPG (Deep Deterministic Policy Gradient)-based system [34], using three levels of adversarial strategies. This demonstrated our method’s efficiency in building critical scenarios and improving testing efficiency.

5.1. Experiment to Demonstrate Adversarial Interaction Capability

To test the adversarial interaction capability of the proposed algorithm in multi-vehicle traffic scenarios, we constructed a test scenario as shown in Figure 11. The yellow vehicle is the Adv, while the other vehicles follow human driving sequences extracted from the NGSIM [35] database. The initial speeds’ and positions’ parameters are shown in Table 3. The decision parameters for the Adv are listed in Table 4. To verify the Adv’s adaptability to different driving conditions, we designed two specific driving styles for the black vehicle.
Interaction in a cautious driving scenario: In this scenario, the black vehicle drives cautiously and remains in its current lane. Due to the lower speed of the red vehicle and the need to occupy the reachable set of the black vehicle, the Adv seeks opportunities to change lanes. While performing adversarial behaviors, the Adv also interacts with surrounding vehicles to ensure safety and efficiency. The results are shown in Figure 12, including the driving paths and speed profiles of the vehicles. The trends of risk and corresponding game objects are depicted in Figure 12c. The game process at three specific moments is illustrated in Figure 12d, which describes the predicted trajectories of the game vehicles, the action space of the Adv, and the selected optimal actions (indicated by “*”).
The adversarial decision-making process consists of three stages: Deceleration and following stage; Lane-changing and acceleration stage; Completing lane change and maintaining lane stage. In the first stage (t = 0~1.8 s), the adversarial vehicle (yellow) travels at high speed. Due to the slower speed of the vehicle ahead and the high risk of direct lane change, it chooses to decelerate and follow. In the second stage (t = 1.8~4.5 s), the yellow vehicle starts to change lanes, and the game vehicle switches to the black vehicle. As shown in the speed profile in Figure 12b, the black vehicle’s speed increases. At this point, decelerating the yellow vehicle would increase the risk of collision. Therefore, the decision outcome for the yellow vehicle is as shown in Figure 12d. In the third stage (t > 4.5 s), after the yellow vehicle completes the lane change, it stabilizes and maintains its current lane.
Interaction in an Aggressive Driving Scenario: In this scenario, the black vehicle drives aggressively, making it difficult for the yellow vehicle to change lanes. The black vehicle’s continuous acceleration complicates adversarial interactions, highlighting the importance of game theory. The results are shown in Figure 13.
The decision-making process includes three stages: Deceleration and following (0~1.8 s): Similar to the previous scenario; Attempting and abandoning lane change (1.8~4.2 s): The black vehicle’s continuous acceleration increases the yellow vehicle’s risk, leading it to revert to its original lane; Stable following (after 4.5 s): Once the black vehicle surpasses the yellow vehicle, the yellow vehicle loses the ability to invade the black vehicle’s reachable set and switches to a stable following strategy.
In Scenario 1, the Adv changes lanes at an appropriate time and speed, ensuring safety and reasonable interaction while occupying the black vehicle’s reachable set. In Scenario 2, the Adv compromises, abandoning the adversarial action. In summary, the proposed algorithm for the adversarial vehicle demonstrates rationality, interpretability, and similarity to human driving behavior in adversarial interactions.

5.2. Multi-Level Adversarial Capability Verification

This section verifies the algorithm’s ability to handle different levels of adversarial behavior by designing two test scenarios. As shown in Figure 14, Scenario 1 is a high-speed dual-lane scenario, while Scenario 2 is a more complex high-speed three-lane scenario. Increasing complexity tests the algorithm’s adaptability.
In Scenario 1, the black vehicle uses an RRT-based path-planning system for decision-making, while the yellow vehicle uses our proposed adversarial planning strategy. The red vehicle follows a pre-set trajectory. We set three adversarial levels for the yellow vehicle: low, medium, and high. We then analyzed the interaction between these vehicles to test the adaptability of the driving behavior under different adversarial conditions. Figure 15 illustrates the interactions for these levels.
Low Adversarial Level ( λ A P = 0.6 ): In a dual-lane scenario, the yellow and red vehicles already occupy part of the black vehicle’s reachable set, reducing the proportion of the black vehicle’s online reachable set to about 0.6. Consequently, the black vehicle does not perform a lane change, instead maintaining its desired speed and then decelerating to follow the red vehicle once it matches its speed. Medium Adversarial Level ( λ A P = 0.4 ): The yellow vehicle begins to ensure its safety and efficiency by attempting to occupy the black vehicle’s reachable area. At 3.2 s, it decides to change lanes, prompting the black vehicle to decelerate in response to the yellow vehicle’s lane change. High Adversarial Level ( λ A P = 0.2 ): The yellow vehicle adopts an aggressive driving style, performing three adversarial maneuvers as shown in the red areas. These occur at 1 s (accelerating to prevent the black vehicle from occupying its lane), 2.4 s (changing lanes earlier and faster than at the medium level), and 5.8 s (decelerating, forcing the black vehicle to decelerate as well).
In Scenario 2, the yellow vehicle uses an RRT-based path-planning system for decision-making, while the black and blue vehicles use our proposed adversarial planning strategy. The red vehicle follows a pre-set fixed trajectory. We designed three sets of experiments with varying adversarial levels for the black and blue vehicles: high–high, high–medium, and high–low. Figure 16 illustrates the interactions for these levels.
In Scenario 2-1, both the black and blue vehicles maintain a high adversarial level. The yellow vehicle stays in the middle lane and decelerates to follow the red vehicle when it gets too close due to the red vehicle’s low speed. The black and blue vehicles first accelerate to their desired speeds and then execute two deceleration maneuvers to maintain a small longitudinal distance from the yellow vehicle, occupying the space needed for the yellow vehicle to change lanes. In Scenario 2-2, the yellow vehicle remains in the middle lane throughout. The black vehicle behaves similarly to Scenario 2-1, while the blue vehicle adopts a more moderate driving style with reduced speed. The longitudinal distance between the blue and yellow vehicles remains moderate, with the blue vehicle exerting less pressure on the yellow vehicle compared to the black vehicle. In Scenario 2-3, the blue vehicle’s adversarial level is lower, leading to more noticeable deceleration and providing enough longitudinal space for the yellow vehicle to execute a right lane change. At 4.8 s, the black vehicle attempts a right lane change to encroach on the yellow vehicle’s space but returns to its original lane due to the high risk of a forced lane change near the red vehicle.

5.3. Validation of Key Scenario Construction

To verify the effectiveness of the game-theoretic adversarial interaction-based planning control algorithm in improving testing efficiency and constructing critical scenarios, we utilized initial states and trajectories from the NGSIM natural driving dataset. We tested two intelligent driving systems: an RRT-based path-planning autonomous driving system [33] and a DDPG-trained autonomous driving system [34]. The tests were conducted on the open-source Carla simulator platform with ROS for communication. The overall testing framework is shown in Figure 17.
We designed multiple driving states for the adversarial vehicle, including fixed natural driving behavior from the NGSIM database and three adversarial levels (low, medium, high) using the game-theoretic planning control algorithm. Considering the complexity of black-box autonomous systems, we selected 20 initial states that can induce lane changes and tested each scenario 200 times. Metrics recorded include collision rate, changes in lateral and longitudinal acceleration, average risk factor, and time to reach the endpoint.
Table 5 shows that the collision rate for the proposed game-theoretic adversarial interaction algorithm increased significantly compared to natural driving scenarios. As the adversarial level increased, the frequency of collisions rose. At λ A P = 0.6 , the collision rate for the RRT-based system increased by 158%, while the DDPG-based system’s rate increased by 385%. At λ A P = 0 . 4, the collision rate for the RRT-based system increased by 347%, while the DDPG-based system’s rate increased by 794%. At λ A P = 0 . 2, the collision rate for the RRT-based system increased by 594%, while the DDPG-based system’s rate increased by 1313%. Testing efficiency significantly improved.
When the test indicators of the AV under test (comfort, risk factor, traffic efficiency, collision rate) remain at a relatively high level, we consider it to be in a critical test scenario. Table 6 provides further details on the impact on the tested objects. With higher adversarial levels, the comfort, risk, and average time to reach the destination for the tested intelligent driving vehicles all showed an increase. Overall, the RRT-based system performed better than the DDPG system, highlighting the proposed game-theoretic adversarial interaction algorithm’s ability to produce more adversarial scenarios and generate a large number of critical scenarios. These critical scenarios are especially valuable due to the lack of scenarios encountered by the DDPG system during training.

6. Conclusions

This study explores the decision-making and planning models for autonomous vehicles based on game-theoretic adversarial interactions through a series of experiments and validations. The behavior characteristics of vehicles during interactive games were examined, and the effects of different driving strategies on adversarial interactions were discussed. A method to generate adversarial interaction behaviors using the Stackelberg game model was proposed, and its application in various traffic scenarios was investigated through numerical experiments. The main conclusions are as follows:
Effectiveness of the Model: Our game-theoretic decision algorithm effectively simulates human driving behavior. It shows robust adversarial interaction capabilities across different driving strategies, providing rational and interpretable results.
Generation of Graded Adversarial Behaviors: The algorithm can generate adversarial behaviors of different levels for testing autonomous driving systems. Experimental results show that the algorithm can produce corresponding adversarial behaviors based on the set adversarial levels, thereby validating the response capabilities of autonomous driving systems.
Improved Testing Efficiency and Critical Scenario Generation: The proposed algorithm enhances testing efficiency significantly. High adversarial levels increase collision rates, offering crucial data for system optimization. It generates critical interaction scenarios, crucial for training deep reinforcement learning-based systems.
Future research will focus on enhancing the algorithm’s adaptability to more complex traffic scenarios. Integrating real-world data and refining the adversarial behavior models will further improve the system’s robustness. Additionally, expanding the scope to include interactions with non-vehicular road users, like pedestrians and cyclists, will be crucial. These steps aim to create safer and more reliable autonomous driving systems.

Author Contributions

Conceptualization, X.Z. and H.L.; methodology, X.Z.; formal analysis, J.W.; investigation, J.W. and H.W.; resources, X.Z.; data curation, J.W.; writing—original draft preparation, X.Z.; writing—review and editing, X.Z., H.L. and J.W.; visualization, X.Z., J.W. and H.W.; supervision, X.Z.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under Grant 2020AAA0108103.

Data Availability Statement

The scene data in the article comes from NGSIM, and the relevant data link is https://data.transportation.gov/Automobiles/Next-Generation-Simulation-NGSIM-Vehicle-Trajector/8ect-6jqj/about_data (accessed on 12 June 2024), At the same time, the author of this article will open source a software-in-the-loop test system based on CARLA that integrates the full-stack algorithm of autonomous driving, please contact the author for specific data details.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pei, H.; Zhang, J.; Zhang, Y.; Pei, X.; Feng, S.; Li, L. Fault-tolerant cooperative driving at signal-free intersections. IEEE Trans. Intell. Veh. 2023, 8, 121–134. [Google Scholar] [CrossRef]
  2. He, Z.; Pei, H.; Guo, Y.; Yao, D.; Zhang, Y.; Li, L. Comparative analysis of cooperative driving strategies for CAVs at on-ramps. IEEE Trans. Veh. Technol. 2023, 72, 8099–8104. [Google Scholar] [CrossRef]
  3. Lee, S.; Arvin, R.; Khattak, A.J. Advancing investigation of automated vehicle crashes using text analysis of crash narratives and Bayesian analysis. Accid. Anal. Prev. 2023, 181, 106932. [Google Scholar] [CrossRef] [PubMed]
  4. Sun, J.; Zhang, H.; Zhou, H.; Yu, R.; Tian, Y. Scenario-based test automation for highly automated vehicles: A review and paving the way for systematic safety assurance. IEEE Trans. Intell. Transp. Syst. 2022, 23, 14088–14103. [Google Scholar] [CrossRef]
  5. Feng, S.; Yan, X.; Sun, H.; Feng, Y.; Liu, H.X. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment. Nat. Commun. 2021, 12, 748. [Google Scholar] [CrossRef] [PubMed]
  6. Yan, X.; Zou, Z.; Feng, S.; Zhu, H.; Sun, H.; Liu, H.X. Learning naturalistic driving environment with statistical realism. Nat. Commun. 2023, 14, 2037. [Google Scholar] [CrossRef] [PubMed]
  7. Feng, S.; Sun, H.; Yan, X.; Zhu, H.; Zou, Z.; Shen, S.; Liu, H.X. Dense reinforcement learning for safety validation of autonomous vehicles. Nature 2023, 615, 620–627. [Google Scholar] [CrossRef] [PubMed]
  8. Zhou, J.; Wang, L.; Wang, X. Online Adaptive Generation of Critical Boundary Scenarios for Evaluation of Autonomous Vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6372–6388. [Google Scholar] [CrossRef]
  9. Ding, W.; Xu, C.; Arief, M.; Lin, H.; Li, B.; Zhao, D. A survey on safety-critical driving scenario generation a methodological perspective. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6971–6988. [Google Scholar] [CrossRef]
  10. Zhao, D.; Lam, H.; Peng, H.; Bao, S.; LeBlanc, D.J.; Nobukawa, K.; Pan, C.S. Accelerated evaluation of automated vehicles safety in lane-change scenarios based on importance sampling techniques. IEEE Trans. Intell. Transp. Syst. 2017, 18, 595–607. [Google Scholar] [CrossRef] [PubMed]
  11. Huang, Z.; Lam, H.; LeBlanc, D.J.; Zhao, D. Accelerated evaluation of automated vehicles using piecewise mixture models. IEEE Trans. Intell. Transp. Syst. 2018, 19, 2845–2855. [Google Scholar] [CrossRef]
  12. Rana, A.; Malhi, A. Building safer autonomous agents by leveraging risky driving behavior knowledge. In Proceedings of the 2021 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), Beijing, China, 15–17 October 2021; pp. 1–6. [Google Scholar]
  13. Ding, W.; Lin, H.; Li, B.; Eun, K.J.; Zhao, D. Semantically adversarial driving scenario generation with explicit knowledge integration. arXiv 2021, arXiv:2106.04066. [Google Scholar]
  14. Smullyan, R.M. First-Order Logic; Courier Corporation: Chelmsford, MS, USA, 1995. [Google Scholar]
  15. Klischat, M.; Liu, E.I.; Holtke, F.; Althoff, M. Scenario factory: Creating safety-critical traffic scenarios for automated vehicles. In Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Rhodes, Greece, 20–23 September 2020; pp. 1–7. [Google Scholar]
  16. Bennett, J. OpenStreetMap; Packt Publishing Ltd.: Birmingham, UK, 2010. [Google Scholar]
  17. Lopez, P.A.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.P.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wiessner, E. Microscopic traffic simulation using SUMO. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2575–2582. [Google Scholar]
  18. Wachi, A. Failure-scenario maker for rule-based agent using multi-agent adversarial reinforcement learning and its application to autonomous driving. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macau, China, 10–16 August 2019; pp. 6006–6012. [Google Scholar]
  19. Xiong, Z.; Xu, H.; Li, W.; Cai, Z. Multi-source adversarial sample attack on autonomous vehicles. IEEE Trans. Veh. Technol. 2021, 70, 2822–2835. [Google Scholar] [CrossRef]
  20. Xu, X.; Zhang, J.; Li, Y.; Wang, Y.; Yang, Y.; Shen, H.T. Adversarial attack against urban scene segmentation for autonomous vehicles. IEEE Trans. Ind. Inform. 2021, 17, 4117–4126. [Google Scholar] [CrossRef]
  21. Kong, Z.; Guo, J.; Li, A.; Liu, C. PhysGAN: Generating physical-world-resilient adversarial examples for autonomous driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14242–14251. [Google Scholar]
  22. O’Kelly, M.; Sinha, A.; Namkoong, H.; Duchi, J.; Tedrake, R. Scalable end-to-end autonomous vehicle testing via rare-event simulation. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018; pp. 9827–9838. [Google Scholar]
  23. Chen, B.; Chen, X.; Wu, Q.; Li, L. Adversarial evaluation of autonomous vehicles in lane-change scenarios. IEEE Trans. Intell.Transp. Syst. 2022, 23, 10333–10342. [Google Scholar] [CrossRef]
  24. Wang, J.; Pun, A.; Tu, J.; Manivasagam, S.; Sadat, A.; Casas, S.; Ren, M.; Urtasun, R. AdvSim: Generating safety-critical scenarios for self-driving vehicles. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 9904–9913. [Google Scholar]
  25. Szczepanski, R.; Tarczewski, T.; Erwinski, K. Energy efficient local path planning algorithm based on predictive artificial potential field. IEEE Access 2022, 10, 39729–39742. [Google Scholar] [CrossRef]
  26. Wei, C.; Romano, R.; Merat, N.; Wang, Y.; Hu, C.; Taghavifar, H.; Hajiseyedjavadi, F.; Boer, E.R. Risk-based autonomous vehicle motion control with considering human driver’s behavior. Transp. Res. Part C Emerg. Technol. 2019, 107, 1–14. [Google Scholar] [CrossRef]
  27. Malin, F.; Norros, I.; Innamaa, S. Accident risk of road and weather conditions on different road types. Accid. Anal. Prev. 2019, 122, 181–188. [Google Scholar] [CrossRef] [PubMed]
  28. Liu, E.I.; Würsching, G.; Klischat, M.; Althoff, M. CommonRoad-Reach: A toolbox for reachability analysis of automated vehicles. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; pp. 2313–2320. [Google Scholar]
  29. Althoff, M.; Kochdumper, N. CORA 2016 Manual; TU Munich: Garching, Germany, 2016; p. 85748. [Google Scholar]
  30. Chen, X.; Sankaranarayanan, S.; Abraham, E. Flow* 1.2: More effective to play with hybrid systems. In ARCH14-15. 1st and 2nd International Workshop on Applied veRification for Continuous and Hybrid Systems; EasyChair: Stockport, UK, 2015; Volume 34, pp. 152–159. [Google Scholar]
  31. Geretti, L.; Sandretto, J.A.D.; Althoff, M.; Benet, L.; Collins, P.; Forets, M.; Ivanova, E.; Li, Y.; Mitra, S.; Mitsch, S.; et al. ARCH-COMP23 category report: Continuous and hybrid systems with nonlinear dynamics. In 10th International Workshop on Applied Verification of Continuous and Hybrid Systems; EasyChair: Stockport, UK, 2023; pp. 61–88. [Google Scholar]
  32. Li, R.; Zhu, H.; Banach, R. Translating and verifying Cyber–Physical systems with shared-variable concurrency in SpaceEx. Internet Things 2023, 23, 100864. [Google Scholar] [CrossRef]
  33. Eshtehardian, S.; Khodaygan, S. A continuous rrt*-based path planning method for non-holonomic mobile robots using b-spline curves. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 8693–8702. [Google Scholar] [CrossRef]
  34. Radwan, M.O.; Sedky, A.A.H.; Mahar, K.M. Obstacles avoidance of self-driving vehicle using deep reinforcement learning. In Proceedings of the 2021 31st International Computer Theory and Applications (ICCTA), Alexandria, Egypt, 1–13 December 2021; pp. 215–222. [Google Scholar]
  35. Coifman, B.; Li, L. A critical evaluation of the Next Generation Simulation (NGSIM) vehicle trajectory dataset. Transp. Res. Part B Methodol. 2017, 105, 362–377. [Google Scholar] [CrossRef]
Figure 1. Conversion Relationship between Frenet Coordinate System and Global Coordinate System.
Figure 1. Conversion Relationship between Frenet Coordinate System and Global Coordinate System.
Machines 12 00538 g001
Figure 2. Kinematic model-based trajectory prediction at the moments T 1 , T 2 , and T 3 with a prediction duration of N p .
Figure 2. Kinematic model-based trajectory prediction at the moments T 1 , T 2 , and T 3 with a prediction duration of N p .
Machines 12 00538 g002
Figure 3. Action Spaces for the Vehicle and Adversarial Vehicle under Different Driving States.
Figure 3. Action Spaces for the Vehicle and Adversarial Vehicle under Different Driving States.
Machines 12 00538 g003
Figure 4. Safety Distance Model for Vehicles in the Same Direction.
Figure 4. Safety Distance Model for Vehicles in the Same Direction.
Machines 12 00538 g004
Figure 5. Schematic of the hazard level of the surrounding vehicles, where the red vehicle is the ego vehicle and the others are the adversarial vehicles.
Figure 5. Schematic of the hazard level of the surrounding vehicles, where the red vehicle is the ego vehicle and the others are the adversarial vehicles.
Machines 12 00538 g005
Figure 6. Road risk mapping.
Figure 6. Road risk mapping.
Machines 12 00538 g006
Figure 7. (a) Offline Reachable Set: Calculate the reachable cells C k + 1 at time t k + 1 and encode them into the graph. (b) Online Reachable Set: Remove nodes that intersect with forbidden sets F ( t k ) or have no predecessor cells (indicated in dark gray).
Figure 7. (a) Offline Reachable Set: Calculate the reachable cells C k + 1 at time t k + 1 and encode them into the graph. (b) Online Reachable Set: Remove nodes that intersect with forbidden sets F ( t k ) or have no predecessor cells (indicated in dark gray).
Machines 12 00538 g007
Figure 8. Visualization of online reachable set and offline reachable set calculation.
Figure 8. Visualization of online reachable set and offline reachable set calculation.
Machines 12 00538 g008
Figure 9. Process for Determining Leader and Follower Vehicles.
Figure 9. Process for Determining Leader and Follower Vehicles.
Machines 12 00538 g009
Figure 10. Illustration of Solving the State–Action Value Function.
Figure 10. Illustration of Solving the State–Action Value Function.
Machines 12 00538 g010
Figure 11. Initial experimental scenarios for adversarial interactivity validation.
Figure 11. Initial experimental scenarios for adversarial interactivity validation.
Machines 12 00538 g011
Figure 12. Interaction in a cautious driving scenario. (a) Path of this process. (b) Speed profile. (c) Performance of vehicles and adversarial intensity between vehicles at different times, including game vehicles (black, blue, red). (d) Basic actions of the ego vehicle and the interactive vehicle.
Figure 12. Interaction in a cautious driving scenario. (a) Path of this process. (b) Speed profile. (c) Performance of vehicles and adversarial intensity between vehicles at different times, including game vehicles (black, blue, red). (d) Basic actions of the ego vehicle and the interactive vehicle.
Machines 12 00538 g012
Figure 13. Interaction in an Aggressive Driving Scenario. (a) Path of the process. (b) Speed profile. (c) Driving performance and adversarial intensity between vehicles (black, blue, and red) at different times. (d) Basic functions of the ego vehicle and interacting vehicles.
Figure 13. Interaction in an Aggressive Driving Scenario. (a) Path of the process. (b) Speed profile. (c) Driving performance and adversarial intensity between vehicles (black, blue, and red) at different times. (d) Basic functions of the ego vehicle and interacting vehicles.
Machines 12 00538 g013
Figure 14. Test Scenarios for Evaluating Adversarial Capability Levels.
Figure 14. Test Scenarios for Evaluating Adversarial Capability Levels.
Machines 12 00538 g014
Figure 15. The left side shows the planning results for different adversarial levels, and the right side shows the speed profiles of the black vehicle (autonomous) and the yellow vehicle (adversarial). Red areas in the speed profiles indicate adversarial behavior.
Figure 15. The left side shows the planning results for different adversarial levels, and the right side shows the speed profiles of the black vehicle (autonomous) and the yellow vehicle (adversarial). Red areas in the speed profiles indicate adversarial behavior.
Machines 12 00538 g015
Figure 16. The left side shows the planning results for different adversarial levels, and the right side shows the speed profiles of the yellow vehicle (autonomous), the black vehicle (adversarial), and the blue vehicle (adversarial). Red areas in the speed profiles indicate adversarial behavior.
Figure 16. The left side shows the planning results for different adversarial levels, and the right side shows the speed profiles of the yellow vehicle (autonomous), the black vehicle (adversarial), and the blue vehicle (adversarial). Red areas in the speed profiles indicate adversarial behavior.
Machines 12 00538 g016
Figure 17. Test Framework Based on Carla (0.9.13) Software-in-the-Loop.
Figure 17. Test Framework Based on Carla (0.9.13) Software-in-the-Loop.
Machines 12 00538 g017
Table 1. Relative Risk Factors R a under Different Road Conditions.
Table 1. Relative Risk Factors R a under Different Road Conditions.
Road ConditionProbabilityNumber of Accidents Relative   Risk   R a
Good0.84579920.9
Rain0.0889781.06
Rain and Snow0.005771.46
Snow0.06214172.18
Table 2. State–Action Values for Vehicle Interactions.
Table 2. State–Action Values for Vehicle Interactions.
State–Behavior ValuesMovement of the Vehicle in Front (Leader)
B 1 B 2 B 3 B 4 B 5
Movement of the vehicle behind
(follower)
A 1 8.36,
10.06
12.46,
17.36
12.31,
18.50
10.57,
19.17
11.50,
16.92
A 3 5.03,
5.07
6.12,
5.26
4.58,
4.66
8.03,
8.17
6.05,
6.17
A 4 6.59,
5.68
9.45,
9.46
12.00,
6.61
13.19,
8.35
9.42,
7.67
A 5 5.07,
7.31
3.43,
5.46
6.15,
15.11
6.16,
11.73
4.22,
10.15
Table 3. Initial setting of parameters.
Table 3. Initial setting of parameters.
Vehicle IDInitial Position (m)Initial Speed
Adv vehicle (yellow)(27, −1.75)13 m/s
Ego vehicle (white)(2, 1.75)10 m/s
Background vehicle (red)(51, −2)9 m/s
Background vehicle (blue)(74, 1.75)13 m/s
Table 4. Decision-making parameterization.
Table 4. Decision-making parameterization.
ParametersValue
Prediction Period T / s 0.1
Prediction Time Horizon N p / s 2
Acceleration for normal driving a m a x / ( m / s 2 ) 2
Deceleration for normal driving a m i n / ( m / s 2 ) −3
Extra transverse angular velocity w / ( r a d / s ) 0.15
Value Function Discount Factor γ 0.98
Acceleration/deceleration/lane change corresponding cost factor λ 1 / λ 2 0.02/0.04
Dynamic/Road risk assessment factor α 1 / α 2 0.8/0.2
Expected speed v p / ( m / s ) 13
Road width W / m 3.5
Safety constant k 1 0.17
Spacing constant b 1 10
Cell size d x s / d x l ( m )0.5
Radius ρ 1.25
Level of adversarial desired λ A P 0.6
Table 5. Collision Rates of Different AVs Under Various Adversarial Levels.
Table 5. Collision Rates of Different AVs Under Various Adversarial Levels.
Adversarial Vehicle
Level of Adversarial Desired
None
A P t = 0
Low
λ A P = 0.6
Moderate
λ A P = 0.4
High
λ A P = 0.2
NGSIM [35]0.81%0.72%1.09%1.94%
RRT [33]1.68%2.67 (0.99)% 5.83 (4.15)% 9.98 (8.30)%
DDPG [34]1.28%4.94 (3.66)% 10.17 (8.89)% 16.81 (15.53)%
The arrows indicate that the data is incremental.
Table 6. Impact of Different Adversarial Levels on AV States.
Table 6. Impact of Different Adversarial Levels on AV States.
AV under Test
DDPGRRT
a + ( m / s 3 ) a g v ( D P t ) a g v ( T f i n i s h ) a + ( m / s 3 ) a g v ( D P t ) a g v ( T f i n i s h )
A P t = 0 1.224.4810.540.944.4410.08
λ A P = 0.6 2.17
( 0.95 )
7.29
( 2.81 )
10.76
( 0.22 )
0.89
( 0.05 )
6.82
( 2.38 )
10.49
( 0.41 )
λ A P = 0.4 3.05
( 1.83 )
13.81
( 9.33 )
11.37
( 0.83 )
1.27
( 0.33 )
9.31
( 4.87 )
10.88
( 0.8 )
λ A P = 0.2 3.65
( 2.43 )
15.86
( 11.38 )
11.93
( 1.39 )
2.34
( 1.4 )
14.57
( 10.13 )
11.34
( 1.26 )
The arrows indicate that the data is incremental, ↓ indicate that the data is decreased.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, X.; Liang, H.; Wang, J.; Wang, H. Game-Theoretic Adversarial Interaction-Based Critical Scenario Generation for Autonomous Vehicles. Machines 2024, 12, 538. https://doi.org/10.3390/machines12080538

AMA Style

Zheng X, Liang H, Wang J, Wang H. Game-Theoretic Adversarial Interaction-Based Critical Scenario Generation for Autonomous Vehicles. Machines. 2024; 12(8):538. https://doi.org/10.3390/machines12080538

Chicago/Turabian Style

Zheng, Xiaokun, Huawei Liang, Jian Wang, and Hanqi Wang. 2024. "Game-Theoretic Adversarial Interaction-Based Critical Scenario Generation for Autonomous Vehicles" Machines 12, no. 8: 538. https://doi.org/10.3390/machines12080538

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop