6 CF 5

Hindawi
Journal of Advanced Transportation

Volume 2022, Article ID 6601014, 18 pages
https://doi.org/10.1155/2022/6601014
Research Article
Stacking-Based Ensemble Learning Method for the Recognition of
the Pedestrian Crossing Intention
Hongjia Zhang, Song Gao , and Pengwei Wang

School of Transportation and Vehicle Engineering, Shandong University of Technology, Zibo 255000, China
Correspondence should be addressed to Song Gao; [email protected]
Received 28 June 2022; Revised 13 September 2022; Accepted 12 October 2022; Published 2 November 2022
Academic Editor: Zhenning Li
Copyright © 2022 Hongjia Zhang et al. Tis is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Accurate recognition of pedestrian crossing intentions is essential for the safe operation of autonomous vehicles on urban roads.
However, the current pedestrian crossing intention recognition model has the problems of relatively low recognition accuracy and
short recognition advance time. Based on the above problems, this paper carried out a study on the recognition model of
pedestrian crossing intention. Firstly, the pedestrian and vehicle crossing data were collected through laser radar and a high-
defnition monitor, and 1980 groups of valid samples were selected. Secondly, the pedestrian crossing intention characterization
parameter set was determined through statistical analysis. Finally, this paper proposes a pedestrian crossing intention recognition
model based on stacking ensemble learning. Te ensemble learning framework integrates random forest (RF), support vector
machine (SVM), long short-term memory network (LSTM), an attention mechanism, and bidirectional LSTM (AT-Bi-LSTM).
Compared with traditional machine learning methods, the proposed method shows greater advantages in recognition accuracy.
Te model recognition accuracy reaches 95.36% when the model is recognized at 0.5 s before crossing the zebra crossing, and the
model recognition accuracy is 89.27% when the model is recognized at 1s before crossing the zebra crossing. Te research in this
paper is of great signifcance for building a more intelligent pedestrian-vehicle collaboration and promoting the industrial
application of the autonomous vehicle.
1. Introduction the road trafc system, pedestrians belong to a vulnerable

group. Once a pedestrian-vehicle accident occurs, even a
A zebra crossing is an area for pedestrians to cross the slight scratching accident may induce serious pedestrian
road, and it is also a potential confict area between ve- injury or even death [8, 9].
hicles and pedestrians [1, 2]. According to the accident With the rapid development of current technology,
statistics report issued by the road trafc management autonomous vehicles are getting closer to reality. Autono-
department, the number of pedestrian deaths rose from mous vehicles have signifcant potential in reducing colli-
14,923 in 2015 to 17,473 in 2019. Te proportion of pe- sion-related casualties, improving trafc conditions, and
destrian deaths in the total number of trafc accident reducing trafc jams and vehicle emissions. Te U.S. De-
deaths increased from 25.72% to 27.84%. Te number of partment of Transportation released the Autopilot System
injured pedestrians rose from 34,379 to 45,495. Further Safety Vision 2.0 in 2017, which aims to improve the safety
calculations found that from 2015 to 2019, in each pe- and reliability of the autopilot system in order to achieve the
destrian-related accident, 1.2 pedestrians were injured or purpose of reducing the accident rate [10]. In 2016, the
died on average [3–7]. Te above data shows that in recent China Association of Automotive Engineers released a route
years, the situation of vehicle-pedestrian accidents in for autonomous vehicle technology. Te route mentioned
China has been deteriorating year by year, and both the that every vehicle will have a fully automated driving system
absolute number and the proportion of fatalities have or assisted driving system between 2026 and 2030 to im-
been rising. At the same time, the above data prove that in prove road trafc safety [11].
2 Journal of Advanced Transportation
Driving safely on urban roads is an important challenge nighttime, fog, heavy rain, or smoke, with an accuracy of
for autonomous vehicles. In particular, it should be pointed 93.28%. Căilean et al. [19] propose a novel architecture for
out that there are a large number of pedestrians on urban improving pedestrian safety at crosswalks. Te architecture
roads. As relatively complex individuals, their movement can efectively detect pedestrians and predict their street
behavior is afected by factors such as their own emotions, cross actions.
trafc environment, and weather. Trough vision, sound, Völz et al. [20] established a pedestrian crossing in-
gestures, and actions, the driver can understand the pe- tention recognition model based on a data-driven method.
destrians’ intentions and then accurately complete the in- Te main input parameters of the model are the distance
teraction with the pedestrian. However, for autonomous between pedestrians and the zebra crossing, the distance
vehicles, it is difcult to understand the intentions of pe- between vehicles and the zebra crossing parameters, etc. Te
destrians and then accurately complete the pedestrian-ve- pedestrian recognition accuracy is 84.74%. Camara et al. [21]
hicle interaction [12, 13]. Te zebra crossing is the main collected a large amount of pedestrian crossing data and
interaction area between pedestrians and vehicles. Terefore, established a pedestrian crossing intention recognition
the research on the pedestrian crossing intention recogni- model by analyzing the relative position between pedestrians
tion model is carried out in this paper. and vehicles. Te recognition accuracy of the model can
Te main contributions of this paper are as follows: reach up to 96%. Zhao et al. [22] used lidar to collect a large
amount of pedestrian crossing data and established a pe-
(1) Te current pedestrian crossing intention recogni-
destrian crossing intention recognition model based on an
tion models are mainly established based on tradi-
artifcial neural network (ANN) by analyzing the motion
tional machine learning algorithms or deep learning
parameters of pedestrians and vehicles before crossing the
algorithms, and the recognition accuracy is relatively
zebra crossing. When recognized 0.5 s in advance, the model
low. Tis paper proposes a machine learning algo-
recognition accuracy is 92.6%. Zhang et al. [23] proposed a
rithm combination framework that can improve
bidirectional long short-term memory network with an
model recognition accuracy, namely, the stacking
attention mechanism (AT-Bi-LSTM) to establish a pedes-
ensemble learning framework, which integrates four
trian crossing intention recognition model. Te recognition
classical algorithms.
accuracy is 90.68% when the model is 0.6 s in advance.
(2) Te current pedestrian crossing intention recogni- Ghori et al. [24] proposed a new pedestrian crossing
tion model usually cannot take into account the intention recognition framework, which combines con-
recognition accuracy and recognition advance time. volutional neural networks (CNN) and LSTM networks.
Diferent from the current model, this model greatly When recognized 1 s in advance, the recognition accuracy of
increases the recognition advance time on the the model is relatively low, at only 72%. Schulz and Stie-
premise of ensuring recognition accuracy. felhagen [25] and Brouwer et al. [26] established a pedestrian
crossing intention recognition model by estimating the head
2. Related Works movement posture of pedestrians crossing the zebra
crossing. Hashimoto et al. [27] collected the intersection
At present, scholars at home and abroad have carried out a information and established a pedestrian crossing intention
lot of research on pedestrian crossing intention recognition recognition model based on the dynamic Bayesian network
and have achieved relatively fruitful research results. (DBN). Schneemann and Heinemann [28] combined the
Mingus et al. [14] considered the trajectory and posture image data and motion parameters of pedestrians crossing
of pedestrians and established a pedestrian crossing inten- the zebra crossing and established a pedestrian crossing
tion recognition model based on the Gaussian dynamic intention recognition model based on SVM.
model. Te model recognition accuracy is 80%. Quintero Trough the literature review, it can be seen that the
et al. [15, 16] collected the posture data of pedestrians current research on pedestrian crossing intentions has been
crossing the zebra crossing and divided the pedestrian relatively mature. Te recognition accuracy of the intention
movement posture into 11 key points of the human body. A model is already good, and the highest value has exceeded
pedestrian crossing intention recognition model is estab- 90%. However, the recognition advance time of the model is
lished based on the hidden Markov model. When it is relatively short. Overall, existing models do not seem to be
recognized 0.125 s in advance, the accuracy of the model is able to maintain high recognition accuracy while main-
80%. Fang and Lopez [17] collected a large amount of taining a long recognition advance time.
posture data of pedestrians crossing the zebra crossing. Te In general, pedestrian crossing intention recognition can
direction parameters were calculated between diferent be regarded as a time-series modeling and forecasting
points through the positioned human body key point data, problem. Terefore, this paper frst collects the continuous
and a pedestrian crossing intention recognition model was data stream 2.1 s before pedestrians cross the zebra crossing.
established using the support vector machine (SVM) algo- Te data collection uses laser radar and a high-defnition
rithm. Te model has high recognition accuracy, reaching (HD) monitor. Secondly, the characteristic parameters re-
93%. Brehar et al. [18] proposed a method to identify pe- lated to the crossing intention are extracted. Te charac-
destrian crossing behavior using a monocular far infrared. teristic parameters mainly include pedestrian speed, the
Te method can still efectively identify pedestrian street distance between pedestrian and zebra crossing, age, gender,
cross action in low visibility environments such as vehicle speed, the distance between vehicle and zebra
Journal of Advanced Transportation 3
crossing, and time to collision (TTC). Finally, a pedestrian process of solving the optimal classifcation hyperplane. For
crossing intention recognition model is established based on the SVM, the key is the determination of the kernel function,
stacking ensemble learning. Te SVM, random forest (RF), the penalty function C, and the kernel function parameter g.
LSTM, and AT-Bi-LSTM algorithms were integrated. Fig- Te kernel function selected is the radial basis kernel
ure 1 shows the research framework of this paper. function. Te values of the penalty function C and the kernel
Tis paper is divided into fve parts, namely, introduction, function parameter g are determined by the grid search
related works, proposed solution, experimental results, and method. In this paper, when the pedestrian intention is
conclusions. In the frst and second parts, it mainly analyzed identifed at 0 s before crossing the zebra crossing, the values
the confict between pedestrians and vehicles and introduced of C and g are 36 and 2.73, respectively. When the pedestrian
the signifcance of the research on pedestrian crossing in- intention is identifed at 0.5 s before crossing the zebra
tention recognition. In the third part, the crossing intention crossing, the values of C and g are 48 and 2.32, respectively.
recognition algorithm was introduced. Tis paper is based on When the pedestrian intention is identifed at 1 s before
the stacking ensemble learning algorithm, which integrates crossing the zebra crossing, the values of C and g are 45 and
SVM, random forest (RF), LSTM, and AT-Bi-LSTM algo- 2.08, respectively. Since SVM is a common and mature
rithms. Data acquisition equipment and acquisition methods algorithm, it will not be described in more detail in this
were introduced. Te main data acquisition equipment is the paper.
laser radar and an HD monitor. In the fourth part, the
characteristic parameters of pedestrian crossing intention were (2) RF-Base Classifer. RF [34] is a classifer composed of a
analyzed, and the characteristic parameter set of pedestrian large number of decision trees, which is regarded as an
crossing intention was obtained. Te fourth part also analyzed ensemble learning method. Multiple decision tree classifers
the results of the pedestrian crossing intention recognition are trained by sampling with replacement (bootstrap). Each
model based on stacking ensemble learning and compares it decision tree classifer is independent of the others and has
with the traditional intention recognition algorithm. Te ffth no correlation. Many classifers are integrated into an RF
part elaborated on the conclusions of this paper. classifer, and multiple decision tree classifers obtain the
fnal classifcation result through voting. To achieve a good
3. Proposed Solution recognition result, the adjustment of hyperparameters is
essential. Te hyperparameters refer to the number of de-
3.1. Methodology cision trees and the maximum number of features. In this
paper, we also use the grid search method to determine the
3.1.1. Ensemble Learning. Ensemble learning improves the
two important parameter values. When the pedestrian in-
performance of machine learning by combining multiple
tention is identifed at 0 s before crossing the zebra crossing,
models. Compared with a single model, this method allows
the number of decision trees and the maximum number of
for better prediction performance. At present, it is widely
features are 80 and 5, respectively. When the pedestrian
used in some well-known international machine learning
intention is identifed at 0.5 s before crossing the zebra
competitions (Netfix, KDD2009, and Kaggle) and has
crossing, the number of decision trees and the maximum
achieved good rankings. Te ensemble learning method can
number of features are 115 and 5, respectively. When the
be used to solve classifcation and regression tasks [29].
pedestrian intention is identifed at 1s before crossing the
For ensemble learning, there are two main problems
zebra crossing, the number of decision trees and the max-
faced in the process of model integration, namely, (1) how to
imum number of features are 125 and 5, respectively.
change the distribution or weight of the data. (2) How to
combine multiple weak classifers into a strong classifer. For
(3) LSTM-Base Classifer. At the end of the last century,
the above two problems, there are three main solutions: (1)
Hochreiter and Schmidhuber proposed LSTM on the basis
bagging method for reducing variance. (2) Boosting method
of RNN [35], which to some extent overcomes the problem
for reducing bias. (3) Stacking method for improving pre-
of gradient disappearance and explosion in the back
diction results [30–32]. Stacking ensemble learning has a
propagation process. Te LSTM network introduces the
better efect on improving recognition accuracy. Terefore,
concept of “gates,” which are the input gate, forget gate, and
this paper chose stacking ensemble learning.
output gate. Tese three gates are also called the memory
Stacking is a typical representative of ensemble learning
unit of the network. Te main purpose is to selectively delete
methods. Individual weak classifers are called base classi-
and retain the associated information in the data to achieve
fers, and the classifers used for combinations are called
the purpose of continuous update of the cell state and in-
meta-classifers. Te base classifer is usually a heteroge-
crease the model recognition accuracy. Te grid search
neous classifer.
method was used to determine the hyperparameter values.
When pedestrian intention is identifed at 0 s before crossing
3.1.2. Base Classifer and Meta-Classifer the zebra crossing, the learning rate, hidden unit, and
dropout values are 0.01, 128, and 0.4, respectively. When the
pedestrian’s intention is recognized at 0.5 s before crossing
(1) SVM-Base Classifer. SVM [33] is a commonly used the zebra crossing, the values of the learning rate, hidden
supervised learning algorithm for machine learning. It is a unit, and dropout are 0.05, 100, and 0.4, respectively. When
typical linear binary classifer. SVM is also regarded as the the pedestrian intention is recognized at 1 s before crossing
Pedestrian crossing intention recognition based on natural observation data
Meta-classifer Bi-LSTM Stacking-based

ensemble learning
Base-classifer RF SVM LSTM AT-BiLSTM
Statistical analysis of feature parameters of pedestrian crossing intention recognition
Sequence data preprocessing (Filtering, normalization)
Distance between pedestrian Distance between vehicle

Age
and zebra crossing and zebra crossing
TTC Vehicle speed Pedestrian speed Gender
Laser radar HD monitor
Figure 1: Research framework of pedestrian crossing intention: laser radar and HD monitor acquire data before pedestrians cross the zebra
crossing. After the data are fltered, the characterization parameters of pedestrians’ crossing intention are obtained. After preprocessing, the
characterization parameters are input into the stacking learning algorithm, and then the pedestrian crossing intention recognition model is
established.
the zebra crossing, the learning rate, hidden unit, and 􏽥 t,

Ct � ft · Ct−1 + it · C (4)
dropout values are 0.001, 100, and 0.5, respectively. Adam
was used as the optimizer. In addition, the LSTM network where Ct−1 is the unit cell state value at the previous
also solves the problem of interdependence before and after moment.
the input data so that the cell unit has a longer memory Te main function of the output gate is to transfer the
capacity. Te specifc working steps of the LSTM network associated information to the cell unit at the next moment.
are as follows:
ot � σ Wo · 􏼂ht−1 , xt 􏼃 + bo 􏼁, (5)
Forget gate: the main function is to delete useless in-
formation in the cell unit, and the content of the information where ot is the output value of the output gate, Wo is the
is determined by the sigmoid function. weight matrix, and bo is the bias term.
ft � σ 􏼐Wf · 􏼂ht−1 , xt 􏼃 + bf 􏼑, (1) Te fnal output ht of the unit cell at the current moment
can be expressed as follows:
where σ is the forget gate sigmoid function, Wf is the weight
matrix, bf is the bias term, and the output range of ft is [0, ht � ot · tanh Ct 􏼁. (6)
1], and its value is inversely proportional to the degree of
forgetting.
(4) At-Bi-LSTM-Meta Classifer. Pedestrian crossing in-
Input gate: it updates the information in the cell unit of
tention recognition can be regarded as a sequence recog-
the structure. Te sigmoid layer and the tanh layer deter-
nition problem. Te movement state of pedestrians before
mine the updated information in the cell information.
crossing the zebra crossing can refect the pedestrians’
it � σ Wt · 􏼂ht−1 , xi 􏼃 + bi 􏼁, (2) crossing decision. Te data between a certain moment
before crossing the zebra crossing and the next moment has
􏽥 t � tanh Wc · 􏼂ht−1 , xi 􏼃 + bc 􏼁,
C (3) a greater correlation. To better capture the characteristic
information of pedestrian crossing intentions and fully
where σ is the input gate sigmoid function, tanh is the input obtain the correlation of sequence data in a period of time
gate function, Wt and Wc are weight matrices, bi and bc are before crossing the zebra crossing, this paper adopts Bi-
􏽥t
bias terms, it is the input gate cell state update value, and C LSTM [36].
is the tanh function state update value. Te input of the Bi-LSTM model at time t is xt . During
Trough formulas (2)–(4), the fnal updated state value information processing, the state of Bi-LSTM from the
of the cell unit is obtained, and the specifc expression is 4.5. forward to backward direction is updated as follows:
hfwt � H􏼐Wfw xt + Wfw1 hfwt−1 + bfw 􏼑, (7) Softmax Output

layer
where H is the backward output function, Wfw is the weight
matrix from the input layer to the forward layer , Wfw1 is the
weight matrix between forward layers, and bfw is the bias Attention
term. h1 h2 h3 … ht Layer
Te Bi-LSTM model is then updated from the backward

to forward direction as follows: hbw1 hbw2 hbw3 hbwt Backward
BiLSTM
layer
hbwt � H Wbw xt + Wbw1 hbwt−1 + bbw 􏼁, (8) Forward
hfw1 hfw2 hfw3 hfwt
layer
where H‘ is the forward output function, Wbw is the weight
matrix from the input layer to the back layer, Wbw1 is the
weight matrix between back layers, and bbw is the bias term. Input
Equation (9) describes the fnal output of the Bi-LSTM … layer
model following the forward and backward superimposition
as follows:
Figure 2: AT-Bi-LSTM structure: input layer is used to input data;
􏽥 􏼐Wfw2 hfwt + Wbw2 hbwt + bo 􏼑,
ht � H (9) the data fows into the forward and backward layers of the Bi-LSTM
to obtain important clues in the data. Te attention layer is used to
where H 􏽥 is the output function of the output layer, Wfw2 is remove useless information from data and extract key features. Te
the weight matrix from the forward layer to the output layer, softmax layer is responsible for outputting pedestrian intentions.
and Wbw2 is the weight matrix from the backward layer to
the output layer.
Te parameters of the pedestrian crossing intention are training set. In the training phase, the secondary training set
not equally important. To capture the most important in- is generated using the base classifer. If the training set of the
formation and shorten the fow distance of information, the primary classifer is used directly to generate the secondary
Bi-LSTM-based attention mechanism was introduced [37]. training set, the risk of over-ftting will be relatively high.
Te grid search method was used to determine the hyper- Terefore, cross-validation is generally used to generate
parameter values. When the pedestrian intention is iden- training samples for the meta-classifer. Te method used in
tifed at 0 s before crossing the zebra crossing, the learning this paper is 5-fold cross-validation. Firstly, the base clas-
rate, hidden unit, and dropout values are 0.005, 120, and 0.4, sifer (SVM, RF, LSTM, and AT-Bi-LSTM) is obtained
respectively. When the pedestrian intention is recognized at through the primary training set training, and the primary
0.5 s before crossing the zebra crossing, the values of the training set is divided into 5 subsets. Secondly, the training
learning rate, hidden unit, and dropout are 0.001, 120, and set is reconstructed through 5-fold cross-validation to obtain
0.4, respectively. When the pedestrian intention is recog- the secondary training set, which is used to train the meta-
nized at 1 s before crossing the zebra crossing, the learning classifer. Finally, the meta-classifer (Bi-LSTM) is obtained
rate, hidden unit, and dropout values are 0.001, 100, and 0.2, through the training of the secondary training set.
respectively. Adam was used as the optimizer. Figure 2 Figure 3 presents the framework of stacking-based en-
presents the four components of the AT-Bi-LSTM frame- semble learning. Table 1 is the pseudocode of the stacking
work, namely, (1) the input layer, which inputs the feature algorithm, and the main steps of model training are de-
parameter sequence of the crossing intention, (2) the LSTM scribed as follows:
layer, (3) the attention layer, and (4) the output layer. Step 1: divide the pedestrians’ intention sample dataset
Te correlation function of the attention layer is S into the training set Strain and Stest according to the
expressed as follows:
ratio of 3 : 1. According to the 5-fold cross-validation
Q � tanh(P), method, we randomly and equally divide Strain into 5
subsets, namely, S1, S2, S3, S4, and S5, and select one of
β � softmax􏼐cT Q􏼑, the subsets Si (i � 1, 2, . . ., 5) as the verifcation subset in
(10)
ε � PβT , turn. Use the remaining S+i � Strain − Si as the training
subset.
h∗ � tanh(ε),
Step 2: we use S+i as the training set of base classifers
where P is a vector composed of h1 , h2 , h3 . . . ht , T is the data RF, SVM, LSTM, and AT-Bi-LSTM, use Si as the
length, c is a trained parameter vector, and h∗ is the fnal verifcation subset, and output the test result xi. Si-
value used for classifcation. multaneously, we predict the test set Stest and output the
prediction result yi.
Step 3: we iterate step 2 fve times to obtain {x1, x2, x3,
3.1.3. Stacking-Based Ensemble Learning Algorithm x4, and x5}, and we merge the results according to the
Description. Te training set based on stacking ensemble columns to get the column vector X1 of the same length
learning includes a primary training set and a secondary as the original training set Strain. We combine the test
Test Test Dataset

Training Dataset label Training Dataset
Dataset label
5-fold cross validation
Validation sub-set Training sub-set 1 Training sub-set 1 Training sub-set 1 Training sub-set 1 Test dataset
Training sub-set 2 Validation sub-set Training sub-set 2 Training sub-set 2 Training sub-set 2 Test dataset
Training sub-set 3 Training sub-set 3 Validation sub-set Training sub-set 3 Training sub-set 3 Test dataset
Training sub-set 4 Training sub-set 4 Training sub-set 4 Validation sub-set Training sub-set 4 Test dataset
Training sub-set 5 Training sub-set 5 Training sub-set 5 Training sub-set 5 Validation sub-set Test dataset
RF model SVM model LSTM model AT-BiLSTM model
N11 N21 N31 N41

Bi-LSTM model
N12 N22 N32 N42
N13 N23 N33 N43 M1 M2 M3 M4
N14 N24 N34 N44 Model recognition

accuracy
N15 N25 N35 N45
Figure 3: Stacking ensemble learning architecture: the data are divided into the training set and the test set. Te training set is divided into
four training subsets and one validation subset, and a new subset is obtained by the basic classifer RF, SVM, LSTM, and AT Bi LSTM. Te
new subset is trained by the meta-classifer to obtain the pedestrian crossing intention recognition model. Similarly, the new test set is
obtained by four base classifers. Te test set is input into the intention recognition model to obtain the fnal recognition accuracy.
Table 1: Pseudocode of the stacking algorithm.

Input: training set Strain � {(x1,y1), (x2,y2),. . ., (xm,ym)};
Base classifer: L1, L2, . . . LT;
Meta classifer: L (Bi-LSTM).
Process:
for t � 1,2, . . .., T do
ht � Lt (Strain)% train the base classifer separately using the training set
end for
N � Ø; % create new datasets
for i � 1, 2 m do
for t � 1,2, . . ., T do
zit � ht (xi)% use the classifer ht to test the validation set
end for
N � N∪{(zi1, zi2,. . . ziT), yi}
end for
h’ � L (N); % training a meta-classifer based on the Bi-LSTM algorithm with the newly combined dataset
Output: H(x) � h’ (h1 (x), h2 (x). . ., hT (x))
samples and take the average to obtain a column vector 3.2.3. Data Collection and Analysis. To overcome the in-
Y1 of the same length as the original test Stest. fuence of time heterogeneity, all observation experiments
Step 4: by sequentially performing step 3 on the base were conducted on sunny days. Pedestrian crossing inten-
classifers SVM, LSTM, and AT-Bi-LSTM, we obtain tion recognition is a continuous-time series classifcation
X2, X3, and X4 from the original training set and Y2, Y3, problem. Te pedestrians’ crossing intention is determined
and Y4 from the original test set. according to the speed change within a period of time before
the pedestrians cross the zebra crossing or the time series
Step 5: we combine X1, X2, X3, and X4 and the label L of
change of the surrounding environment (vehicle speed or
the original training set Strain to obtain a new sample
the distance between the vehicle and the zebra crossing, etc.).
dataset N � {X1, X2, X3, X4, and L}, and we use it as the
Generally speaking, when pedestrians are crossing the zebra
training dataset of the meta-classifer Bi-LSTM. We
crossing, they determine their intention to cross the zebra
obtain the accuracy of the meta-classifer via the test
crossing by observing the surrounding environment (such as
dataset M � {Y1, Y2, Y3, Y4, and P}.
the distance between the vehicle and themselves), which is
refected in the speed of the pedestrian crossing the zebra
crossing. If the pedestrian does not slow down, it may be a
3.2. Experimental
direct crossing behavior. Figure 7 shows a schematic dia-
3.2.1. Study Site. Figures 4 and 5 are diagrams of the study gram of the pedestrian crossing. In this paper, pedestrian
site and equipment placement location, respectively. Te crossing intentions are divided into three categories, namely,
zebra crossing section has no signal light control and “walking-walking intention (WWI),” “walking-stopping
monitoring equipment. Te width of the zebra crossing is intention (WSI),” and “stopping-walking intention (SWI).”
12 m, a two-way four-lane. Te road gradient is small and WWI refers to a pedestrian crossing the zebra crossing
negligible, and the road is separated by a double yellow line. without stopping after reaching the curb. WSI means that
Tere is no green belt or bufer waiting area. Te selected after considering the road trafc environment, pedestrians
road is a common road in the city. Te trafc fow in this did not choose to cross directly after reaching the curb but
section is mainly composed of small passenger vehicles. waited. SWI means that pedestrians start to cross the zebra
crossing after waiting at the curb.
In this paper, the main process of selecting the char-
3.2.2. Experimental Equipment. Te laser radar model acteristic parameters of pedestrian intention before crossing
LUX4L-4 selected in this experiment is produced by the the zebra crossing is as follows.
German IBEO company, as shown in Figure 6. Te radar Check whether the pedestrian has the intention of
used in the experiment belongs to the four-line radar, and crossing the zebra crossing through the HD monitor. If the
the scanning frequency is set to 12.5 Hz. Te detection range video shows that the pedestrian is WWI, then we need to go
of the lidar is 0.3–200 m, the vertical viewing angle is back for a certain period of time and collect the pedestrian-
3.2°FOV, and the horizontal viewing angle can reach 110°. related data and vehicle-related data during this period of
Te radar used in the experiment can perform real-time time through the laser radar. If it is determined through the
scanning of all objects within the detection feld, including video that the pedestrians’ intention to cross the zebra
moving objects and stationary objects. At the same time, the crossing is WSI or SWI, we use the same method to reverse
data collected by the radar are read through the associated the laser radar and record it.
software ILV-Premium, as shown in Figure 6. Trough this Te intention characterization parameters selected in
software, the type, speed, and position of the target detected this paper are mainly pedestrian speed, the distance between
by the radar can be displayed in real time. Te specifc the pedestrian and the zebra crossing, vehicle speed, the
display interface of software is shown in Figure 6. distance between the vehicle and the zebra crossing, and
Te selected HD monitor is small in size, and the video TTC. In addition, the paper also introduces the infuence of
resolution is 1920 × 1080. Figure 6 shows the physical image. pedestrian age, gender, and group on pedestrians’ intention
Both the LUX radar and the driving recorder are powered by to cross the zebra crossing. Te specifc defnition is as
small batteries. Te data collection location is 15 m away follows:
from the zebra crossing. In addition, the use of radar alone Pedestrian speed is the mean speed value of pedestrians
will miss a large amount of data, making the selection work during a period of time before crossing the zebra crossing,
more complicated. At the same time, the gender and age of obtained by laser radar. In the process of collecting pe-
pedestrians cannot be judged. In order to overcome this destrian speed by radar, the true speed value is obtained after
problem, radar and HD monitors are used together. After Kalman fltering, and the speed value of each frame is
the two devices are synchronized in time, the HD monitor is counted to fnally get the mean speed of the pedestrian
used to determine whether the pedestrian wants to cross the before crossing the zebra crossing.
zebra crossing. Te data of the pedestrian before or when Te distance between the pedestrian and the zebra
crossing the zebra crossing are collected by the laser radar. crossing (DPZC) refers to the square and root result of the
Te radar point cloud image recorded by ILV-Premium is two parameters of the vertical distance between the pe-
the main, and the video recorded by the HD monitor is destrian and the curb and the vertical distance between the
auxiliary to realize the precise selection of data. pedestrian and the zebra crossing.
15m
Camera
Laser
Figure 4: Schematic diagram of the study site: the equipment is placed at the curb, about 15 meters away from the zebra crossing.
85° 25°
Figure 5: Photograph of the study site: lidar detection angle is 110°. It can completely cover the whole road.
Laser radar Ilv-Premium
Pedestrian and Vehicle and

pedestrian speed vehicle speed
HD monitor
Video player
Figure 6: Laser radar and HD monitor: the upper part of the picture is a radar map, and the lower part is a camera map. Time syn-
chronization between two devices.
Zebra crossing found that there were signifcant diferences in TTC values
4
under diferent intentions (F (2, 1977) � 1719.60, p < 0.001),
Road curb and the post-hoc test found that there were signifcant
2 Time series T diferences in TTC values after pairings with diferent in-
Position (m)
tentions (p < 0.001).

0
-2 4.1.2. Vehicle Speed. Figure 9(a) shows the vehicle speed line
Cross time t
chart under diferent crossing intentions within 2.1 s before
-4 Pedestrian trajectory crossing the zebra crossing. It can be seen that when the
intention is SWI, the vehicle speed value when pedestrians’
3.0 2.4 1.8 1.2 0.6 0 cross the zebra crossing is the largest, which is at the top of
Time (s) the three curves. When the intention is WWI, the vehicle
speed value is the second, in the middle of the three curves.
Figure 7: Time-series schematic diagram before pedestrians cross
When the intention is WSI, the vehicle speed value is the
the zebra crossing: the dotted line is the curb, and the gray box is the
zebra crossing. Time series T refers to the time from the beginning
smallest, which is at the bottom of the three curves. Gen-
of the pedestrian trajectory to the time when pedestrians arrive at erally speaking, with the change of time, the value of vehicle
the zebra crossing. speed does not change much, and the value is relatively
stable.
Figure 9(b) is a box diagram of vehicle speed under
Te distance between the vehicle and the zebra crossing diferent crossing intentions. When the crossing intentions
(DVZC) refers to the vertical distance between the vehicle are WWI, SWI, and WSI, the mean values of vehicle speed
and the zebra crossing. are 30.61 km/h, 29.94 km/h, and 31.21 km/h. One-way
TTC refers to the distance between the vehicle and the ANOVA found that there were signifcant diferences in
zebra crossing divided by the current speed of the vehicle. vehicle speed values under diferent intentions (F (2, 1977) �
83.69 and p < 0.001), and the post-hoc test found that there
was no signifcant diference in the vehicle speed values
3.2.4. Data Preprocessing. Te data obtained from the radar
between WWI and SWI (p � 0.15 > 0.05). Tere is a sig-
will bring a lot of noise and interference signals. In order to
nifcant diference in the vehicle speed value between WWI
make the collected data closer to the real value, this paper
and WSI (p < 0.001). Tere is a signifcant diference in the
uses a Kalman flter to flter the data directly collected by the
vehicle speed value between SWI and WSI (p < 0.001).
radar. It should be pointed out that the distance value be-
tween the vehicle and the zebra crossing and the vehicle
speed value is larger than the pedestrian speed value and the 4.1.3. Distance between Pedestrian and Zebra Crossing.
value between the pedestrian and the zebra crossing, in order Figure 10(a) shows the DPZC changes under diferent
to more accurately capture the key information in the data, crossing intentions within 2.1 s before crossing the zebra
reduce the training time of the model, and improve model crossing. It can be seen that when the intention is WWI, the
recognition accuracy. Tis paper uses the min-max function DPZC value when pedestrians cross the zebra crossing is the
to normalize the characteristic parameters. largest, which is at the top of the three curves. When the
intention is WSI, the DPZC value is the second, in the
4. Experimental Results middle of the three curves. When the intention is SWI, the
DPZC value is the smallest, which is at the bottom of the
4.1. Characteristic Parameter Analysis Results three curves. Generally speaking, as time goes by, the DPZC
value with the intention of WWI and WSI shows a steady
4.1.1. Time to Collision. Figure 8(a) shows the TTC line chart downward trend. Te DPZC value with the intention of SWI
under diferent crossing intentions within 2.1 s before did not change signifcantly.
crossing the zebra crossing. It can be seen that when the Figure 10(b) is a box diagram of DPZC for pedestrians
intention is WWI, the selected TTC value when pedestrians under diferent crossing intentions. When the crossing inten-
cross the zebra crossing is the largest, which is at the top of tions are WWI, SWI, and WSI, the mean values of DPZC are
the three curves. When the intention is SWI, the TTC value 1.05 m, 0.44 m, and 0.18 m. One-way ANOVA found that there
selected by pedestrians crossing the zebra crossing is second, were signifcant diferences in DPZC values under diferent
in the middle of the three curves. When the intention is WSI, intentions (F (2, 1977) � 2018.46, p < 0.001), and the post-hoc
the TTC value selected by pedestrians crossing the zebra test found that there were signifcant diferences in DPZC values
crossing is the smallest, which is at the bottom of the three after pairings with diferent intentions (p < 0.001).
curves. As time goes by, the TTC value under diferent
intentions shows a steady downward trend.
Figure 8(b) is a box diagram of the TTC under diferent 4.1.4. Pedestrian Speed. Figure 11(a) shows the pedestrian
crossing intentions. When the intentions are WWI, SWI, speed changes under diferent crossing intentions within
and WSI, the mean values of TTC are 5.79 s, 5.22 s, and 2.1 s before crossing the zebra crossing. It can be seen that
2.51 s, respectively. One-way analysis of variance (ANOVA) when the intention is WWI, the pedestrian speed value when
8 12
10
6
8
TTC (s)
TTC (s)
4 6
4
2
2
0
0
2.1s 1.8s 1.5s 1.2s 0.9s 0.6s 0.3s 0s WWI SWI WSI
Time (s)
WWI
SWI
WSI
(a) (b)
Figure 8: TTC under diferent crossing intentions. (a) Line chart of TTC change with time under diferent intentions. (b) Box diagram of
TTC under diferent crossing intentions.
35 60
50
Vehicle speed (km/h)
30 40
Speed (km/h)
30
25 20
10
20 0
Time (s)
SWI
WWI
WSI
(a) (b)
Figure 9: Vehicle speed under diferent crossing intentions. (a) Line chart of vehicle speed change with time under diferent intentions. (b)
Box diagram of vehicle speed under diferent crossing intentions.
pedestrians cross the zebra crossing is the largest, which is at Figure 11(b) is a box diagram of pedestrian speed for
the top of the three curves. When the crossing intention is pedestrians under diferent crossing intentions. When the
WSI, the pedestrian speed value is the second, in the middle crossing intentions are WWI, SWI, and WSI, the mean
of the three curves. When the intention is SWI, the pe- values of pedestrian speed are 4.27 km/h, 0.39 km/h, and
destrian speed value is the smallest, which is at the bottom of 2.22 km/h. One-way ANOVA found that there were sig-
the three curves. Generally speaking, as time goes by, there is nifcant diferences in pedestrian speed values under dif-
no signifcant change in the value of pedestrian speed with ferent intentions (F (2, 1977) � 2274.09 and p < 0.001), and
WWI. Te value of pedestrian speed whose intention is WSI the post-hoc test found that there were signifcant difer-
drops rapidly. Te pedestrian speed value with the intention ences in pedestrian speed values after pairings with diferent
of SWI shows a slow upward trend. intentions (p < 0.001).
Distance between pedestrian and zebra 3 4
Distance between pedestrian and zebra

3
2
crossing (m)
crossing (m)
2
1 1
0
-1
Time(s)
WWI
WSI
SWI
(a) (b)
Figure 10: DPZC under diferent crossing intentions. (a) Line chart of DPZC change with time under diferent intentions. (b) Box diagram
of DPZC under diferent crossing intentions.
6 10
5 8
Pedestrian speed (km/h)
Pedestrian speed (km/h)
4 6
3 4
2 2
1 0
0 -2
Time (s)
WWI
WSI
SWI
(a) (b)
Figure 11: Pedestrian speed under diferent crossing intentions. (a) Line chart of pedestrian speed change with time under diferent
intentions. (b) Box diagram of pedestrian speed under diferent crossing intentions.
4.1.5. Distance between Vehicle and Zebra Crossing. crossing is the smallest, which is at the bottom of the three
Figure 12(a) shows the DVZC changes under diferent curves. Generally speaking, as time goes by, the DVZC value
crossing intentions within 2.1 s before crossing the zebra under diferent intentions shows a steady downward trend.
crossing. It can be seen that when the intention is WWI, the Figure 12(b) is a box diagram of DVZC for pedestrians
DVZC value when pedestrians cross the zebra crossing is the under diferent crossing intentions. When the crossing in-
largest, which is at the top of the three curves. When the tentions are WWI, SWI, and WSI, the mean values of DVZC
intention is SWI, the DVZC value is the second, in the are 49.28 m, 45.13 m, and 19.44 m. One-way ANOVA found
middle of the three curves. When the intention is WSI, the that there were signifcant diferences in DVZC values under
DVZC value selected by pedestrians crossing the zebra diferent intentions (F (2, 1977) � 2247.65, p < 0.001), and
Distance between vehicle and zebra 70 120
Distance between vehicle and zebra

60 100
50 80
crossing (m)
crossing (m)
40
60
30
40
20
20
10
0
0
Time (s)
WWI
SWI
WSI
(a) (b)
Figure 12: DVZC under diferent crossing intentions. (a) Line chart of DVZC change with time under diferent intentions. (b) Box diagram
of DVZC under diferent crossing intentions.
Table 2: Number of intention samples. the model was evaluated by precision, recall, F1 score,
confusion matrix, and receiver operating characteristic
Label Train sample Test sample (ROC) curve.
WWI 494 164
SWI 482 160
WSI 510 170 4.2.1. Model Recognition Results at 0 s before Crossing the
Zebra Crossing. Table 3 shows the model evaluation results
when the model is 0 s before crossing the zebra crossing.
the post-hoc test found that there were signifcant difer-
Compared with several traditional machine learning algo-
ences in DVZC values after pairings with diferent intentions
rithms, it is found that the pedestrian crossing intention
(p < 0.001).
model based on stacking ensemble learning has the highest
recognition accuracy, reaching 98.79%. Te precision, recall,
4.1.6. Age and Gender. Numerous studies have shown that and F1 score of this model for identifying WWI are 98.78%,
the age and gender of pedestrians have great diferences in 98.78%, and 98.78%, respectively. In the same way, the
the choice of pedestrians to cross the zebra crossing. Gen- precision, recall, and F1 scores of the model for identifying
erally speaking, men’s choice of crossing the zebra crossing is SWI are 99.38%, 98.76%, and 99.07%, respectively. Te
relatively aggressive, and women’s choice is relatively cau- precision, recall, and F1 scores of the model for identifying
tious [38, 39]. Te ages of pedestrians are usually divided WSI are, respectively, 99.24%, 98.82%, and 98.53%. Te
into young, middle-aged, and old. When crossing the zebra comprehensive evaluation found that the pedestrian
crossing, elderly pedestrians choose relatively cautiously, crossing intention model based on stacking-based ensemble
while middle-aged pedestrians choose more aggressively. learning introduced in this paper has the best recognition
Generally, 18–30, 30–59, and >59 are young, middle-aged, performance. Te running time of the stacking model is
and old, respectively [40–42]. 0.0083 s, and the running times of the AT-Bi-LSTM, LSTM,
RF, and SVM models are 0.0032 s, 0.0054 s, 0.0065 s, and
0.0046 s, respectively. It can be seen that the running times of
4.2. Model Results. Trough the analysis in the previous the above models are all in milliseconds, which can meet the
chapter, the input parameter set of the model is fnally actual needs.
determined, which includes TTC, DPZC, DVZC, vehicle Figure 13 shows the ROC curve of each model. It can be
speed, pedestrian speed, age, and gender. In this paper, a seen from the fgure that when the false positive rate is 5%,
total of 1980 sets of valid data are selected, of which 75% are the pedestrian crossing intention recognition model based
used as the training set, and the remaining 25% are used as on stacking ensemble learning has the highest true positive
the test set. Te training set uses a fve-fold cross-validation rate, followed by AT-Bi-LSTM, LSTM, RF, and SVM. Sec-
method. Table 2 shows the number of training samples and ondly, the area under the ROC curve based on the stacking
the number of test samples under diferent intentions. In ensemble learning method is the largest, which is higher
this paper, the pedestrian crossing intention recognition than the other four algorithms. In addition, the ROC curves
models at 0 s, 0.5 s, and 1 s before crossing the zebra of the fve algorithms are relatively far from the straight-line
crossing are established, respectively. Te performance of y � x, which shows that the recognition performance of the
Table 3: Model evaluation result at 0 s before crossing the zebra crossing.

WSI WWI SWI
Algorithm Accuracy (%)
Pr (%) Re (%) F1 (%) Pr (%) Re (%) F1 (%) Pr (%) Re (%) F1 (%)
SVM 90.08 88.24 89.29 88.76 89.63 89.63 89.63 92.50 91.36 91.93
RF 92.12 90.59 91.67 91.12 92.07 92.07 92.07 93.75 92.59 93.17
LSTM 93.54 91.76 93.41 92.58 93.9 93.33 93.62 95.00 93.83 94.41
AT-Bi-LSTM 96.15 95.29 95.86 95.58 96.34 95.76 96.05 96.88 96.88 96.88
Stacking 98.79 98.24 98.82 98.53 98.78 98.78 98.78 99.38 98.76 99.07
Note. Pr represents precision, Re represents recall, and F1 represents F1 scores.
1 the intention recognition model based on stacking ensemble

learning has the highest accuracy of 95.36%, the model
0.8 1
recognition accuracy based on AT-Bi-LSTM is 92.12%, the
model recognition accuracy based on LSTM is 89.30%, and
True positive rate
0.9
the model recognition accuracy based on RF is 87.07%. Te
0.6 SVM-based model has the lowest recognition accuracy, which
is 85.26%. It can be seen from Table 4 that the precision, recall,
0.8
0.4 0 0.1 0.2 and F1 score of the pedestrian crossing intention model based
on stacking ensemble learning are signifcantly higher than
the other four algorithms. It can be seen that the stacking
0.2
ensemble learning method introduced in this paper has the
best recognition performance at 0.5 s before crossing the zebra
0 crossing. Compared with Table 3, it can be seen that when the
0 0.2 0.4 0.6 0.8 1
model is recognized at 0.5 s before crossing the zebra crossing,
False positive rate
the accuracy has decreased to a certain extent. Te main
SVM AT-Bi-LSTM reason is that some key features contained in the sequence
RF Stacking data have been deleted. However, in general, the accuracy of
LSTM the model can still meet actual needs. Te running time of the
Figure 13: ROC curves of diferent models at 0 s before crossing stacking model is 0.0076 s, and the running times of the AT-
the zebra crossing. Bi-LSTM, LSTM, RF, and SVM models are 0.0027 s, 0.0060 s,
0.0061 s, and 0.0052 s, respectively.
Figure 15 shows the ROC curve of each model. It can be
fve models is better. A comprehensive comparison found seen from the fgure that when the false positive rate is 5%,
that the performance of the pedestrian crossing intention the pedestrian crossing intention recognition model based
recognition model based on stacking ensemble learning on stacking ensemble learning has the highest true positive
introduced in this paper is the best. rate, followed by AT-Bi-LSTM, LSTM, RF, and SVM. Sec-
Figure 14 shows the confusion matrix of the fve algo- ondly, the area under the ROC curve based on the stacking
rithms. It can be seen from the confusion matrix that the ensemble learning method is the largest, which is higher
SVM-based intention recognition model has the most than the other four algorithms. Compared with Figure 16, it
misrecognition times. Te number of times that WWI is can be seen that the area under the ROC curve corre-
recognized as SWI and WSI is 6 and 11, respectively, and the sponding to each algorithm has been reduced, and the
times that SWI is recognized as WWI and WSI are, re- performance of the model has begun to decline.
spectively, 5 and 7, and the number of times that WSI is Figure 16 shows the confusion matrix of the fve algorithms.
recognized as WWI and SWI is 12 and 8, respectively. In It can be seen from the confusion matrix that the SVM-based
contrast, the pedestrian crossing intention recognition intention recognition model still has the most misrecognition
model based on stacking integrated learning has the least times. Te number of times that WWI is recognized as SWI and
number of misrecognitions and the best model performance. WSI is 10 and 15, respectively, and the times that SWI is
Among them, the times that WWI is recognized as SWI and recognized as WWI and WSI are, respectively, 7 and 11, and the
WSI are 1 and 1, respectively, and the times that SWI is number of times that WSI is recognized as WWI and SWI is 19
recognized as WWI and WSI are 1 and 1, respectively, and and 11, respectively. In contrast, the pedestrian crossing in-
the times of WSI being recognized as WWI and SWI are 2 tention recognition model based on stacking ensemble learning
and 1, respectively. has the least number of misrecognitions and the best model
performance. Among them, the times that WWI is recognized
as SWI and WSI are 3 and 5, respectively, and the times that
4.2.2. Model Recognition Results at 0.5 s before Crossing the SWI is recognized as WWI and WSI are 2 and 5, respectively;
Zebra Crossing. Table 4 shows the model evaluation results the times of WSI being recognized as WWI and SWI are 7 and
when the model is 0.5 s before crossing the zebra crossing. 3, respectively. Compared with Table 3, the number of mis-
Compared with several traditional algorithms, it is found that recognition times has increased.
147 6 11 89.63% 151 5 8 92.07% 154 4 6 93.90%

WWI WWI WWI
29.76% 1.21% 2.23% 10.37% 30.57% 1.01% 1.62% 7.93% 31.17% 0.81% 1.21% 6.10%
5 148 7 91.93% 4 150 6 93.75% 3 152 5 95.00%

SWI SWI SWI
1.01% 29.96% 1.42% 8.07% 0.81% 30.36% 1.21% 6.25% 0.61% 30.77% 1.01% 5.00%
12 8 150 88.24% 9 7 154 90.58% 8 6 156 91.76%

WSI WSI WSI
2.43% 1.62% 30.36% 11.76% 1.82% 1.42% 31.17% 9.42% 1.62% 1.21% 31.58% 8.24%
89.63% 91.36% 89.29% 90.08% 92.07% 92.59% 91.67% 92.12% 93.33% 93.82% 93.41% 93.54%
10.37% 8.64% 10.71% 9.92% 7.93% 7.41% 8.33% 7.88% 6.67% 6.18% 6.59% 6.46%
WWI SWI WSI WWI SWI WSI WWI SWI WSI

(a) (b) (c)
158 2 4 96.34% 162 1 1 98.78%

WWI WWI
31.98% 0.40% 0.81% 3.66% 32.79% 0.20% 0.20% 1.22%
2 155 3 96.88% 1 159 1 99.38%

SWI SWI
0.40% 31.38% 0.61% 3.12% 0.20% 32.19% 0.20% 0.62%
5 3 162 95.29% 2 1 167 98.24%

WSI WSI
1.01% 0.61% 32.79% 4.71% 0.40% 0.20% 33.81% 1.76%
95.76% 96.88% 95.86% 96.15% 98.78% 98.75% 98.82% 98.79%

4.24% 3.12% 4.14% 3.85% 1.22% 1.25% 1.18% 1.21%
WWI SWI WSI WWI SWI WSI

(d) (e)
Figure 14: Confusion matrix at 0 s before crossing the zebra crossing: the cyan color indicates the number of correct recognitions and their
proportion in all samples, and the light red color indicates the number of misrecognitions and their proportion in all samples. Te rightmost
column in the fgure is precision, and the bottom column is recall. (a) Confusion matrix of SVM; (b) confusion matrix of LSTM; (c)
confusion matrix of AT-Bi-LSTM; (d) confusion matrix of stacking.
Table 4: Model evaluation result at 0.5 s before crossing the zebra crossing.
WSI WWI SWI
SVM 85.26 84.76 84.24 84.50 88.75 87.12 87.93 82.35 84.34 83.33
RF 87.07 86.59 86.59 86.59 90.63 88.41 89.51 84.12 86.14 85.12
LSTM 89.30 90.24 88.62 89.43 91.88 90.74 91.30 87.06 88.62 87.83
AT-Bi-LSTM 92.12 92.07 91.52 91.79 93.75 93.17 93.46 90.59 91.67 91.12
Stacking 95.36 95.12 94.55 94.83 96.88 96.27 96.57 94.12 95.24 94.67
0.8 1
True positive rate
0.9
0.6
0.8
0.4 0.7
0 0.1 0.2
0.2
0
0 0.2 0.4 0.6 0.8 1
False positive rate
SVM AT-Bi-LSTM
RF Stacking
LSTM
Figure 15: ROC curves of diferent models at 0.5 s before crossing the zebra crossing.
139 10 15 84.76% 142 9 13 86.59%

WWI WWI
28.14% 2.02% 3.04% 15.24% 28.74% 1.82% 2.63% 13.41%
7 142 11 88.75% 5 145 10 90.63%

SWI SWI
1.42% 28.74% 2.23% 11.25% 1.01% 29.35% 2.02% 9.37%
19 11 140 82.35% 17 10 143 84.12%

WSI 3.85% 2.23% 28.34% 17.65% WSI
3.44% 2.02% 28.95% 15.88%
84.24% 87.12% 84.34% 85.26% 86.59% 88.41% 86.14% 87.07%
15.76% 12.88% 15.66% 1.21% 13.41% 11.59% 13.86% 12.93%
WWI SWI WSI WWI SWI WSI
(a) (b)
148 7 11 89.16%
WWI
29.96% 1.42% 2.23% 10.84%
5 147 8 91.88%
SWI
1.01% 29.76% 1.62% 8.12%
14 8 148 87.06%
WSI
2.83% 1.62% 29.96% 12.94%
88.62% 90.74% 88.62% 89.30%

11.38% 9.26% 11.38% 10.70%
WWI SWI WSI

(c)
151 5 8 92.07% 156 3 5 95.12%

WWI WWI
30.57% 1.01% 1.62% 7.93% 31.58% 0.61% 1.01% 4.88%
4 150 6 93.75% 2 155 3 96.88%

SWI SWI
0.81% 30.36% 1.21% 6.25% 0.40% 31.38% 0.61% 3.12%
10 6 154 90.59% 7 3 160 94.12%

WSI WSI
2.02% 1.21% 31.17% 9.41% 1.42% 0.61% 32.39% 5.88%
91.52% 93.17% 91.67% 92.12% 94.55% 96.27% 95.23% 95.36%

8.48% 6.83% 8.33% 7.88% 5.45% 3.73% 4.77% 4.64%
WWI SWI WSI WSI

WWI SWI
(d) (e)
Figure 16: Confusion matrix at 0.5 s before crossing the zebra crossing: the cyan color indicates the number of correct recognitions and their
Table 5: Model evaluation result at 1 s before crossing the zebra crossing.

WSI WWI SWI
SVM 76.33 76.22 75.30 75.76 82.50 78.57 80.49 70.59 75.00 76.22
RF 78.35 78.05 77.58 77.81 83.75 80.72 82.21 73.53 76.69 78.05
LSTM 81.18 81.71 79.76 80.72 86.25 83.64 84.92 75.88 80.12 81.71
AT-Bi-LSTM 85.23 85.98 83.43 84.68 90.00 87.80 88.89 80.00 84.47 85.98
Stacking 89.27 89.63 88.02 88.82 93.13 90.85 91.98 85.88 88.48 89.63
4.2.3. Model Recognition Results at 1 s before Crossing the ensemble learning has the highest accuracy of 89.27%, the
Zebra Crossing. Table 5 shows the model evaluation results model recognition accuracy based on AT-Bi-LSTM is
when the model is 1 s before crossing the zebra crossing. 85.23%, the model recognition accuracy based on LSTM is
Compared with several traditional algorithms, it is found 81.18%, and the model recognition accuracy based on RF is
that the intention recognition model based on stacking 78.35%. Te SVM-based model has the lowest recognition
0.8
True positive rate

0.6
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
False positive rate
SVM AT-Bi-LSTM
RF Stacking
LSTM
Figure 17: ROC curves of diferent models at 1s before crossing the zebra crossing.
125 16 23 76.22% 128 14 22 78.05% 134 11 19 81.70%

WWI WWI WWI
25.30% 3.24% 4.66% 23.78% 25.91% 2.83% 4.45% 21.95% 27.13% 2.23% 3.85% 18.30%
11 132 17 82.50% 10 134 16 83.75% 9 138 13 86.25%

SWI SWI SWI
2.23% 26.72% 3.44% 17.50% 2.02% 27.13% 3.24% 16.25% 1.82% 27.94% 2.63% 13.75%
30 20 120 70.58% 27 18 125 73.53% 25 16 129 75.88%

WSI WSI WSI
6.07% 4.05% 24.29% 29.42% 5.47% 3.64% 25.30% 26.47% 5.06% 3.24% 26.11% 24.12%
75.30% 78.57% 75.00% 76.33% 77.58% 80.72% 76.69% 78.35% 79.76% 83.64% 80.12% 81.18%
24.70% 21.43% 25.00% 23.67% 22.42% 19.28% 23.31% 21.65% 20.24% 26.36% 19.88% 18.82%
WWI SWI WSI WWI SWI WSI WWI SWI WSI

(a) (b) (c)
147 5 12 89.63%
141 8 15 85.98% WWI
WWI 29.76% 1.01% 2.43% 10.37%
28.54% 1.62% 3.04% 14.02%
4 149 7 93.13%
6 144 10 90.00% SWI
SWI 0.81% 30.16% 1.42% 6.87%
1.21% 29.15% 2.02% 10.00%
16 10 146 84.88%
22 12 136 80.00% WSI
WSI 3.24% 2.02% 29.55% 15.12%
4.45% 2.43% 27.53% 20.00%
88.02% 90.85% 88.48% 89.27%
83.43% 87.80% 84.47% 85.23% 11.98% 9.15% 11.52% 10.73%
16.57% 12.20% 15.53% 14.77%
WWI SWI WSI
WWI SWI WSI
(d) (e)
Figure 18: Confusion matrix at 1s before crossing the zebra crossing: the cyan color indicates the number of correct recognitions and their
accuracy, which is 76.33%. It can be seen from Table 5 that before crossing the zebra crossing. Compared with Tables 3
the precision, recall, and F1 score of the pedestrian crossing and 4, it can be seen that when the model is recognized at 1s
intention model based on stacking ensemble learning are before crossing the zebra crossing, the accuracy has de-
signifcantly higher than the other four algorithms. It can be creased. Te main reason is that most of the key features
seen that the stacking ensemble learning method introduced contained in the sequence data have been deleted. However,
in this paper has the best recognition performance at 1s the method introduced in this paper still has high
recognition accuracy. Te running time of the stacking Conflicts of Interest

model is 0.0094 s, and the running times of the AT-Bi-
LSTM, LSTM, RF, and SVM models are 0.0035 s, 0.0059 s, Te authors declare that they have no conficts of interest.
0.0071 s, and 0.0040 s, respectively.
Figure 17 shows the ROC curve of each model. It can be Acknowledgments
seen from the fgure that when the false positive rate is 5%,
the pedestrian crossing intention model based on stacking Tis work was supported in part by the Postdoctoral Startup
ensemble learning has the highest true positive rate, over Fund under grant number 522010 and in part by the Na-
80%. Te recognition accuracy of the remaining four al- tional Natural Science Foundation Project, 52102465.
gorithms has dropped signifcantly, and the corresponding
value is less than 80%. Secondly, the area under the ROC References
curve based on stacking ensemble learning is the largest,
[1] C. Wang, H. Zhang, H. Wang, and R. Fu, “Te efect of “yield
which is higher than the other four algorithms. Compared to pedestrians” policy enforcement on pedestrian street
with Figures 16 and 17, it can be seen that the area under the crossing behavior: a 3-year case study in Xi’an, China,” Travel
ROC curve corresponding to each algorithm has been Behaviour and Society, vol. 24, pp. 172–180, 2021.
reduced. [2] H. Zhang, Y. Guo, Y. Chen, Q. Sun, and C. Wang, “Analysis of
Figure 18 shows the confusion matrix of the fve algo- pedestrian street-crossing decision-making based on vehicle
rithms. It can be seen from the confusion matrix that the deceleration-safety gap,” International Journal of Environ-
SVM-based intention recognition model has the most mental Research and Public Health, vol. 17, no. 24, p. 9247,
misrecognition times. In contrast, the pedestrian crossing 2020.
intention recognition model based on stacking ensemble [3] Trafc Administration Bureau of the Ministry of Public
Security of the People’s Republic of China, Annual Report of
learning has the least number of misrecognitions and the
Road Trafc Accident Statistics of the People’s Republic
best model performance. Compared with Figures 14 and 16, of China, Beijing, 2020.
the number of misrecognition times has signifcantly [4] Trafc Administration Bureau of the Ministry of Public Se-
increased. curity of the People’s Republic of China, Annual Report
of Road Trafc Accident Statistics of the People’s Republic of
5. Conclusions China, Beijing, 2019.
[5] Trafc Administration Bureau of the Ministry of Public Se-
Tis paper frst collected the motion parameters of pedes- curity of the People’s Republic of China, Annual Report of
trians and vehicles with laser radar and HD monitor and Road Trafc Accident Statistics of the People’s Republic
selected 1980 efective samples. Secondly, the statistical of China, Beijing, 2018.
method is used to obtain the characteristic parameter set that [6] Trafc Administration Bureau of the Ministry of Public
Security of the People’s Republic of China, Annual Report
can refect the pedestrians’ crossing intention. Finally, using
of Road Trafc Accident Statistics of the People’s Republic of
the characteristic parameter set as the input of the stacking China, Beijing, 2017.
integrated learning method, a pedestrian crossing intention [7] Trafc Administration Bureau of the Ministry of Public
model with high recognition accuracy is trained and com- Security of the People’s Republic of China, Annual Report of
pared with traditional machine learning algorithms. Te Road Trafc Accident Statistics of the People’s Republic
results show that the accuracy rate of the pedestrian crossing of China, Beijing, 2016.
intention recognition model based on stacking ensemble [8] J. Zhao, Y. Tang, and Y. Han, “Gap acceptance probability
learning is 98.79% when it is recognized at 0 s before crossing model for pedestrians at unsignalized mid-block crosswalks
the zebra crossing. When it is recognized at 0.5 s before based on logistic regression,” Accident Analysis & Prevention,
crossing the zebra crossing, the accuracy rate of the pe- vol. 129, pp. 76–83, 2019.
[9] J. Zhao, J. O. Malenje, J. Wu, and R. Ma, “Modeling the
destrian crossing intention recognition model based on
interaction between vehicle yielding and pedestrian crossing
stacking ensemble learning is 95.36%. When it is recognized behavior at unsignalized midblock crosswalks,” Trans-
at 1 s before crossing the zebra crossing, the accuracy of the portation Research Part F: Trafc Psychology and Behaviour,
pedestrian crossing intention recognition model based on vol. 73, pp. 222–235, 2020.
stacking ensemble learning is 89.27%. Compared with tra- [10] Us Department of Transportation, “Automated driving Sys-
ditional machine learning algorithms, the method intro- tems2.0,” A Vision for Safety, vol. 24, p. 57, 2017.
duced in this paper has the best recognition performance. [11] Sae-China, “Driverless technology roadmap,” Safety Now,
Te method introduced in this paper has a high accuracy of vol. 67, p. 435, 2016.
intention recognition, which is of practical signifcance for [12] B. Yang and R. Ni, “Vision-based recognition of pedestrian
future fully autonomous vehicles to efectively avoid human- crossing intention in an urban environment,” in Proceedings
vehicle conficts and improve the efciency of urban road of the 2019 IEEE 9th Annual International Conference on
CYBER Technology in Automation, Control, and Intelligent
driving.
Systems, pp. 992–995, Suzhou, China, 29 July 2019 - 02 August
2019.
Data Availability [13] S. Kalantarov, R. Riemer, and T. Oron-Gilad, “Pedestrians
road crossing decisions and body parts movements,” Trans-
Te data used to support the fndings of this study are portation Research Part F: Trafc Psychology and Behaviour,
available from the corresponding author upon request. vol. 53, pp. 155–171, 2018.
[14] R. Mı́nguez, I. Alonso, D. Fernández-Llorca, and M. Sotelo, [28] F. Schneemann and P. Heinemann, “Context-based detection
“Pedestrian path, pose, and intention prediction through of pedestrian crossing intention for autonomous driving in
Gaussian process dynamical models and pedestrian activity urban environments,” in Proceedings of the 2016 IEEE/RSJ
recognition,” IEEE Transactions on Intelligent Transportation International Conference on Intelligent Robots and Systems,
Systems, vol. 20, no. 5, pp. 1803–1814, 2018. Daejeon, South Korea, 09-14 October 2016.
[15] R. Quintero, I. Parra, J. Lorenzo, D. Fernández-Llorca, and [29] T. Dietterich, “Ensemble learning,” Te handbook of brain
M. Sotelo, “A. Pedestrian intention recognition by means of a theory and neural networks, vol. 2, no. 1, pp. 110–125, 2002.
Hidden Markov Model and body language,” in Proceedings of [30] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24,
the 2017 IEEE 20th International Conference on Intelligent no. 2, pp. 123–140, 1996.
Transportation Systems, pp. 1–7, Yokohama, Japan, 16-19 [31] Y. Freund, R. Schapire, and N. Abe, “A short introduction to
October 2017. boosting,” Journal of Japanese Society for Artifcial Intelligence,
[16] R. Quintero, I. Parra, D. Llorca, and M. Sotelo, “Pedestrian vol. 14, pp. 771–780, 1999.
path prediction based on body language and action classif- [32] D. H. Wolpert, “Stacked generalization,” Neural Networks,
cation,” in Proceedings of the 17th International IEEE Con- vol. 5, no. 2, pp. 241–259, 1992.
ference on Intelligent Transportation Systems, pp. 679–684, [33] C. Cortes and V. Vapnik, “Support-vector networks,” Ma-
Qingdao, China, 2010. chine Learning, vol. 20, no. 3, pp. 273–297, 1995.
[17] Z. Fang and A. M. Lopez, “Intention recognition of pedes- [34] L. Breiman, “Random forests,” Machine Learning, vol. 45,
trians and cyclists by 2D pose estimation,” IEEE Transactions no. 1, pp. 5–32, 2001.
on Intelligent Transportation Systems, vol. 21, no. 11, [35] S. Hochreiter and J. Schmidhuber, “Long short-term mem-
pp. 4773–4783, 2020. ory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[18] R. D. Brehar, M. P. Muresan, T. Mariţa, C. C. Vancea, [36] A. Graves, S. Fernández, and J. Schmidhuber, “Bidirectional
M. Negru, and S. Nedevschi, “Pedestrian street-cross action LSTM networks for improved phoneme classifcation and
recognition in monocular far infrared sequences,” IEEE Ac- recognition,” in International Conference on Artifcial Neural
cess, vol. 9, pp. 74302–74324, 2021. Networks, pp. 799–804, Springer, Berlin, Heidelberg, 2005.
[19] A. M. Căilean, C. Beguni, S. A. Avătămăniţei, M. Dimian, and [37] A. Vaswani, N. Shazeer, N. Parmar et al., “Attention is all you
V. Popa, “Design, implementation and experimental inves- need,” Advances in Neural Information Processing Systems,
tigation of a pedestrian street crossing assistance system based vol. 57, p. 758, 2017.
on visible light communications,” Sensors, vol. 22, no. 15, [38] Y. Pei and S. Feng, “Research on design speed of urban pe-
p. 5481, 2022. destrian crossing,” Journal of Highway and Transportation
[20] B. Völz, H. Mielenz, I. Gilitschenski, R. Siegwart, and J. Nieto, Research and Development, vol. 23, no. 9, pp. 104–107, 2006.
“Inferring pedestrian motions at urban crosswalks,” IEEE [39] Y. Guo, “Road trafc safety and business management
Transactions on Intelligent Transportation Systems, vol. 20, manual,” Science and Technology Press, vol. 689, p. 26, 2002.
no. 2, pp. 544–555, 2019. [40] K. V. R. Ravishankar and P. M. Nair, “Pedestrian risk analysis
[21] F. Camara, N. Merat, and C. Fox, “A heuristic model for at uncontrolled midblock and unsignalised intersections,”
pedestrian intention estimation,” in Proceedings of the 2019 Journal of Trafc and Transportation Engineering, vol. 5, no. 2,
IEEE Intelligent Transportation Systems Conference, pp. 137–147, 2018.
pp. 3708–3713, Auckland, New Zealand, 27-30 October 2019. [41] X. Zhuang and C. Wu, “Modeling pedestrian crossing paths at
[22] J. Zhao, Y. Li, H. Xu, and H. Liu, “Probabilistic prediction of unmarked roadways,” IEEE Transactions on Intelligent
pedestrian crossing intention using roadside LiDAR data,” Transportation Systems, vol. 14, no. 3, pp. 1438–1448, 2013.
IEEE Access, vol. 7, pp. 93781–93790, 2019. [42] J. Zhao, Y. Tang, and Y. Han, “Gap acceptance probability
[23] H. Zhang, Y. Liu, C. Wang, R. Fu, Q. Sun, and Z. Li, “Research model for pedestrians at unsignalized mid-block crosswalks
on a pedestrian crossing intention recognition model based based on logistic regression,” Accident Analysis & Prevention,
on natural observation data,” Sensors, vol. 20, no. 6, p. 1776, vol. 129, pp. 76–83, 2019.
2020.
[24] O. Ghori, R. Mackowiak, M. Bautista et al., “Learning to
forecast pedestrian intention from pose dynamics,” in Pro-
ceedings of the 2018 IEEE Intelligent Vehicles Symposium,
pp. 1277–1284, Changshu, China, 26-30 June 2018.
[25] A. Schulz and R. Stiefelhagen, “A controlled interactive
multiple model flter for combined pedestrian intention
recognition and path prediction,” in Proceedings of the 2015
IEEE 18th International Conference on Intelligent Trans-
portation Systems, Gran Canaria, Spain, 15-18 September
2015.
[26] N. Brouwer, H. Kloeden, and C. Stiller, “Comparison and
evaluation of pedestrian motion models for vehicle safety
systems,” in Proceedings of the 2016 IEEE 19th International
Conference on Intelligent Transportation Systems, Rio de
Janeiro, Brazil, 01-04 November 2016.
[27] Y. Hashimoto, Y. Gu, M. Iryo-Asano, and S. Kamijo, “A
probabilistic model of pedestrian crossing behavior at sig-
nalized intersections for connected vehicles,” Transportation
Research Part C: Emerging Technologies, vol. 71, pp. 164–181,
2016.

6 CF 5

Uploaded by

Copyright:

Available Formats

6 CF 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6 CF 5

Uploaded by

Copyright:

Available Formats

Hindawi

Journal of Advanced Transportation

Hongjia Zhang, Song Gao , and Pengwei Wang

Correspondence should be addressed to Song Gao; [email protected]

Academic Editor: Zhenning Li

1. Introduction the road trafc system, pedestrians belong to a vulnerable

Pedestrian crossing intention recognition based on natural observation data

Meta-classifer Bi-LSTM Stacking-based

Base-classifer RF SVM LSTM AT-BiLSTM

Statistical analysis of feature parameters of pedestrian crossing intention recognition

Sequence data preprocessing (Filtering, normalization)

Distance between pedestrian Distance between vehicle

TTC Vehicle speed Pedestrian speed Gender

Laser radar HD monitor

the zebra crossing, the learning rate, hidden unit, and 􏽥 t,

hfwt � H􏼐Wfw xt + Wfw1 hfwt−1 + bfw 􏼑, (7) Softmax Output

Te Bi-LSTM model is then updated from the backward

Test Test Dataset

5-fold cross validation

RF model SVM model LSTM model AT-BiLSTM model

N11 N21 N31 N41

N13 N23 N33 N43 M1 M2 M3 M4

N14 N24 N34 N44 Model recognition

Table 1: Pseudocode of the stacking algorithm.

Laser radar Ilv-Premium

Pedestrian and Vehicle and

tentions (p < 0.001).

Distance between pedestrian and zebra 3 4

Distance between pedestrian and zebra

Pedestrian speed (km/h)

Distance between vehicle and zebra 70 120

Distance between vehicle and zebra

Table 3: Model evaluation result at 0 s before crossing the zebra crossing.

1 the intention recognition model based on stacking ensemble

147 6 11 89.63% 151 5 8 92.07% 154 4 6 93.90%

5 148 7 91.93% 4 150 6 93.75% 3 152 5 95.00%

12 8 150 88.24% 9 7 154 90.58% 8 6 156 91.76%

WWI SWI WSI WWI SWI WSI WWI SWI WSI

158 2 4 96.34% 162 1 1 98.78%

2 155 3 96.88% 1 159 1 99.38%

5 3 162 95.29% 2 1 167 98.24%

95.76% 96.88% 95.86% 96.15% 98.78% 98.75% 98.82% 98.79%

WWI SWI WSI WWI SWI WSI

139 10 15 84.76% 142 9 13 86.59%

7 142 11 88.75% 5 145 10 90.63%

19 11 140 82.35% 17 10 143 84.12%

88.62% 90.74% 88.62% 89.30%

WWI SWI WSI

151 5 8 92.07% 156 3 5 95.12%

4 150 6 93.75% 2 155 3 96.88%

10 6 154 90.59% 7 3 160 94.12%

91.52% 93.17% 91.67% 92.12% 94.55% 96.27% 95.23% 95.36%

WWI SWI WSI WSI

Table 5: Model evaluation result at 1 s before crossing the zebra crossing.

True positive rate

125 16 23 76.22% 128 14 22 78.05% 134 11 19 81.70%

11 132 17 82.50% 10 134 16 83.75% 9 138 13 86.25%

30 20 120 70.58% 27 18 125 73.53% 25 16 129 75.88%

WWI SWI WSI WWI SWI WSI WWI SWI WSI