Elsarticle Template
Elsarticle Template
Publication date
09-06-2023
Licence
This work is made available under the CC BY-NC-ND 4.0 licence and should only be used in accordance with
that licence. For more information on the specific terms, consult the repository record for this item.
Document Version
Accepted version
Published in
Neurocomputing
Xi Daia , Yuxin Maoa , Tianpeng Huanga , Na Qina , Deqing Huanga,∗, Yanan Lib
a School of Electrical Engineering, Southwest Jiaotong University, Chengdu, China
b School of Engineering and Informatics, University of Sussex, Brighton, BN1 9RH, UK.
Abstract
In this paper, a CNN-based learning scheme is proposed to enable a quadrotor
unmanned aerial vehicle (UAV) to avoid obstacles automatically in unknown
and unstructured environments. In order to reduce the decision delay and to
improve the robustness for the UAV, a two-stage end-to-end obstacle avoid-
ance architecture is designed, where a forward-facing monocular camera is used
only. In the first stage, a convolutional neural network (CNN)-based model
is adopted as the prediction mechanism. Utilizing three effective operations,
namely depthwise convolution, group convolution and channel split, the model
predicts the steering angle and the collision probability simultaneously. In the
second stage, the control mechanism maps the steering angle to an instruction
that changes the yaw angle of the UAV. Consequently, when the UAV encoun-
ters an obstacle, it can avoid collision by steering automatically. Meanwhile,
the collision probability is mapped as a forward speed to maintain the flight
or stop going forward. The presented automatic obstacle avoidance scheme of
quadrotor UAV is verified by several indoor/outdoor tests, where the feasibility
and efficacy have been demonstrated clearly. The novelties of the method lie
in its low sensor requirement, light-weight network structure, strong learning
ability and environmental adaptability.
Keywords: Obstacle avoidance, Unmanned aerial vehicle, Convolutional
neural network, Collision probability.
1. Introduction
∗ Correspondingauthor
Email address: [email protected] (Deqing Huang)
Figure 1: A UAV flies through a grove of trees. The work focuses on obstacle avoidance
of a UAV in a social environment.
2
Learning-based methods illustrate advanced performance in various machine
vision tasks. Since the surroundings can be intuitively perceived through the
images captured by UAV, machine vision-based method is suitable for robots
to avoid obstacles [7, 8]. Researches on autonomous UAVs has employed deep
learning algorithms [9, 10, 11], and shown notable positive results. Indeed, the
development of machine learning enhanced the performance of the visual-based
obstacle avoidance method, due to the learning-based perception approaches,
which enabled feature extraction by tuning the parameters in training.
In this direction, several reinforcement learning (RL)-based approaches are
developed to improve the robustness of obstacle avoidance. Sanket et al. [12]
proposed a method to fly through unknown gaps with a monocular camera and
onboard sensing for the presentation of experimental results. Kaufmann [13]
proposed a deep-learning-based approach which showed fast adaptation to an
approximated map. Singla et al. [14] designed a UAV control algorithm to
combine information obtained over a period of time, which could improve the
accuracy of the decision. However, this approach limits the scope of applications
as the UAV needs to fly in the same workspace for several times, so it hinders
the application of the UAV’s operations in safety-critical environments.
In contrast, supervised-learning methods offer a more viable way to learn ef-
fective flying policies, but these methods still leave the issue of collecting enough
expert trajectories for imitation. Collision-avoided trajectories by human expert
pilots are of necessity to teach the robotic platform how to behave in dangerous
situations [3]. Additionally, as pointed out by [3, 10], in order to ensure that
the UAV has a better knowledge of how to control the direction in flight, the
steering angle can be provided by expert pilots. While using deep learning in
flying tasks, the importance of the computation complexity and tracking ac-
curacy have to be considered. In particular, the obstacle avoidance task aims
at obtaining the best accuracy under a limited computational budget, provided
by certain hardware (e.g. a mobile device). Considering the requirements, the
front-facing monocular camera can be used as the primary sensor for the UAV,
which has a low cost in terms of computation and power.
Recently, as an effective alternative, deep learning offers a way to connect
perception and control, which achieves impressive results [15, 16, 17, 18]. Giusti
et al. [10] proposed a CNN-based algorithm, with inputs as one image from
the front-facing camera and outputs as one of three commands, i.e. go straight,
turn left and turn right, and its advantages for flying a quadrotor through forest
trails was demonstrated. Three head-mounted (point straight, left and right)
cameras are used to collect data , and the label of each image is the direction of
the camera taking this image. This means that the directional control output
is also learnt by the network and tailored to one particular environment [19].
In summary, to achieve the objective of obstacle avoidance without a 3D
map, it is essential to control the yaw angle and the forward/backward speed
of the UAV. For this purpose, an automatic obstacle avoidance system based
on CNN is proposed for UAV in this paper. The CNN-based network works for
predicting the quadrotor’s steering angle and the collision probability as a front-
end stage, and the control commands are obtained by mapping in a back-end
stage. The processing speed is taken into account when evaluating the network
that is to be used. The system is shown to perform well in both outdoor and
indoor environments. The research contributions are highlighted as follows.
3
1. Several representative state-of-the-art networks are investigated, as well
as a novel model achieving credible results and satisfying the real-time
requirements of UAV. The effectiveness of the proposed model using the
real-world driving datasets and collision datasets is demonstrated.
2. The first-order Butterworth low filter was adopted to map the predicted
results from the networks to the control instructions, which makes the
control process more smooth and sensitive.
3. A pitch angle control mechanism was introduced in a deterministic arbi-
tration scheme, to enable the UAV to perceive the environment in a 3D
space. In the method, another prediction processing with the top part of
an image is mapped to pitch angle instead of controlling forwarding speed.
Unlike most of other works where the system continuously estimates the
pitch angle, our mechanism is triggered only when the UAV stops with
the collision probability approximately equal to 1.
The obstacle avoidance system proposed in this work represents a good fit for
the obstacle avoidance tasks. The CNN model and filter, which are used to gen-
erate control commands, are efficient in enabling real-time, smooth and reliable
response to the UAV’s situation. Attributing to the datasets from the various
scenes, different in lighting, location, etc., the proposed system is versatile for
the multifarious environments. Meanwhile, three degrees of freedom of UAV are
controlled simultaneously to avoid any dead ends, and to make the flight more
flexible.
2. Related Works
4
where the number of parameters and the computational burden of these models
were remarkably reduced under the premise of ensuring a good performance.
Squeezenet [23] proposed a fire module that only used 1/50 of the parameters,
achieving the same correction rate as AlexNet [25] on ImageNet. Mobilenets [20]
used depth-wise separable convolution, width multipliers and resolution multi-
pliers to reduce the size and latency. Shufflenet [26] proposed channel shuffle
and employing point-wise group convolution, whose complexity was less than
Mobilenets [20]. Mobilenet-V2 [24] inverted residual block and ReLU6 achieved
the same accuracy as ShuffleNet [26] with fewer parameters. Shufflenet-V2 [27]
designed the network structure based on four guidelines, which was shown to
be faster with an accuracy comparable to MobileNet-V2 [26]. However, the
computational burden of these models is still high when applied to the UAV’s
obstacle avoidance. Also, it is proved that part of the computation is actually
not required since the blocks with a smaller number of layers could yield reliable
results for the UAV’s obstacle avoidance.
5
with depth images, these improve resilience to environments, thereby allowing
simple and easy collection of indoor/outdoor scenes for use in training.
For autonomous UAV obstacle avoidance, some basic requirements for image
processing include the following characteristics.
1. Accuracy. Specifically, the prediction should achieve nearly 100% recalls
with high accuracy.
2. Speed. The prediction should ensure real-time processing and a fast in-
ference speed to reduce the latency of the UAV control loop.
Image
Preprocessing
Raw Data CNN Network Parrot Bebop 2
5RDG
&XUYH
5RDG
-XQJOH Collision
Steering Angle Control System
Probability
6
Moreover, the UAV’s model and coordinate system are shown in Figure 3.
The flight states are observed by the UAV’s front-facing camera. To train a
policy of CNN model, the datasets labelled by collision probability are ulitized.
Despite the separate policies, two operating states share the same network struc-
ture. Therefore the added computation is minor. Each step will be introduced
in details in the remainder of this section.
7
Desired Steering Angle
8
Channel Split
Concatenate Concatenate
Channel Channel
Shuffle Shuffle
solve this problem, the number of parameters and memory access cost (MAC)
are used to evaluate the computation complexity of the model. MAC is vital
to evaluate performance, as discussed in [34]. According to [27], to achieve the
practical design of an efficient model with a relatively lower MAC, the following
guidelines are suggested as following:
• equal channel width minimizes MAC;
• excessive group convolution increases MAC;
• network fragmentation reduces the degree of parallelism;
• element-wise operations (such as adding activation function, adding bias,
etc.) have non-negligible effects to MAC.
In this paper, the model is initialized by a 3 × 3-stride-2-convolution-layer
with 2-stride max pooling. The building blocks have three storeys with different
sizes: the size of the first storey is 28 × 28, the second is 14 × 14, and the last is
7 × 7. The blocks are adopted from Shufflenet V2 and are shown in Figure 5. At
the beginning of each block, the spatial downsampling unit is used to change the
size of output, and it connects basic units. Considering real-time and accuracy
requirements, the basic units used in each block are defined as 2, 4, 2. After the
last channel-shuffle layer, the architecture splits into two different convolution
layers, namely, a global-max pooling layer and a fully-connected layer.
As an adaptive learning rate optimization algorithm, Adam is chosen as an
optimizer to train the network with a batch size of 64, the learning rate of 0.001,
and the learning rate decay is set to 0. The models are trained for 130 epochs. To
avoid overfitting to the training data, Batch normalization, regularization, and
Dropout are introduced into the network. Specifically, a group ridge regular-
ization of 10−5 is added in each convolution layer, the convolution layers in the
building blocks are all followed by Batch normalization layers, also, a Dropout
operation is introduced in the front of the fully connection layer. To further
reduce the overfitting, the training process may be stopped when the network
show insignificantly improved in accuracy in five epochs, this method is similar
9
to early stopping. The accuracy on the training dataset of 97.9% compared to
96.6% on the testing dataset, shows that the network has a generalization on
the dataset.
While one of the branches represents yaw prediction, the other one makes a
strategic decision based on the UAV’s location relative to collisions. In specific,
collision prediction and yaw angle prediction have two different fully-connected
layers. The structure of the CNN model is shown in Figure 6.
&RQFDWHQDWH ;':&RQY
;&RQY &KDQQHO6KXIIOH
;&RQY &KDQQHO6SOLW
0D[3RROLQJ )XOO\&RQQHFWLRQ
dž
Figure 6: Architecture of the CNN model. The input image is preprocessed and resized
into 200 × 200 pixels. Except the concatation and the channel-shuffle layers, the layers in red,
yellow, blue boxs have 24, 48 and 96 channels, respectively.
A regressive model based on mean-squared error (MSE) is used for the pre-
diction of yaw angle.
v
u m
u1 X
M SE(X, h) = t (h(x(i) ) − y (i) )2 (1)
m i=1
where X denotes all the feature values of datasets, h is the prediction function
of CNN system, i.e., the CNN system outputs a predicted value h(x(i) ) when the
system is given an instance’s feature vector x(i) , m is the number of images in
the datasets, x(i) and y (i) denote respectively the feature values (In this paper,
it represents the preprocessed image itself ) and label of the i-th image. Binary
cross-entropy (BCE) loss function is used to train the collision prediction. In
this task a two-class problem is described as follows.
where yi is the label (1 for collision and 0 for no collision), and p is the predicted
probability of the collision images out of all images in the datasets.
Since the model (2) is single-input and multiple-output, the solver is required
to find an optimal solution for the two tasks at the same time. The gradients
magnitude of the regression is different from it in the classification task. Naive
joint optimization poses serious convergence problems. Indeed, the gradients
in the two loss functions (MSE for regression task, BCE for classification task)
without adjusting the weights has a negative effect on the convergence of the
training process, which will lead to the adverse consequences[35]. Furthermore,
it is confirmed that supplying the tasks in a meaningful order may lead to
improved performance and better convergence[36]. Therefore, it is necessary
to impose weights for two loss functions when training the model. Considering
that the steering angle needs more epochs to be optimized, the weights used in
10
[3] are adopted by defining
ei −e0
Ltot = LM SE + max(0, 1 − exp− e0
)LBCE (3)
where Ltot is the total loss value for training, ei is the current epoch, e0 is a
value set as 30, and LM SE and LBCE represent loss values corresponding to the
regression and classification tasks, respectively.
2 1
|H(jw)| = (4)
w 2
1+( )
wc
where wc is the cut-off frequency. This filter has the following properties:
2
• |H(jw)| = 1 is the maximum, when w = 0 and the curve has maximum
flatness;
2
• |H(jw)| is a monotonically decreasing function of w and there are no
fluctuations in amplitude.
The scaled steering sk , the output of the yaw angle prediction part, is
mapped to the rotational motion of the UAV around the vertical axis of the
body coordinate system, i.e. yaw angle θk . In particular, sk in the range of
[−1, 1] is required to convert into a desired yaw angle θk in the range of [− π2 , π2 ].
As the control of steering angle does not need to be sensitive to the environment
changes like the forward speed control, a low-pass filter is used for simplicity.
As such, the smooth, continuous control command is defined in the following:
π
θk = (1 − β)θk−1 + β sk , 0 < β < 1. (5)
2
It should be mentioned that obstacle avoidance should be performed, when
the predicted collision probability of the image at the current time Imgt exceeds
a certain threshold (set as 0.9). Imgt is divided into top and bottom parts, which
11
have the same size. A warping with context padding (p = 16 pixels) transforms
the top part into valid CNN model inputs. A 30 degree pitch angle command
will be transmitted to the UAV when the collision probability of top parts is
lower than a certain threshold (set as 0.3).
Figure 7: Samples of collected images Images are extracted into two categories: images
depicting the enviroments where UAV may function normally (green box), and images showing
the UAV may collide with the environments (red box).
The datasets are divided into two distinct categories, one labelled with yaw
angle and the other involving involves the collision information.
The UAV’s yaw angle can be resembled to the self-driving-car’s steering angle
by projecting the quadrocopter z-axis into the horizontal plane. Two large-scale
image autonomous driving datasets, containing over 1,200,000 frames collected
from Comma.ai [38] and Udacity [39], are used to train the yaw angle predic-
tion. These datasets acquired contain video clips captured by a single front-view
camera, located similarly to the UAV front-facing camera and mounted on the
windshield of the vehicle. Alongside the video data, a set of time-stamped sen-
sor measurement is contained, such as the vehicle velocity, acceleration, steering
angle, GPS location and gyroscope angles. In this work, the regression branch
is trained using the image of front-facing camera and the corresponding steer-
ing angle only, on account of the similarity of the angle between the camera
mentioned before and the monocular camera of UAV. Note that the estimates
of the interpolated measurements of the sensor logs are used to deal with the
problem that the sensor logs are out of sync with the time-stamps of video.
The datasets of collisions initially used to train the network are 45,000 im-
ages from the RPG collision data [3], which are collected in urban environments
by cyclists, labelled as positive or negative based on the distance between the
camera to its nearest objects. The dataset is then enlarged with 20,000 fur-
ther samples from the experiments based on our networks trained by datasets
mentioned above. Positive data need to be collected manually on account of
nonequilibrium between positive and negative sample attributed to a manually
stop of UAV when it is about to collide. Figure 7 depicts some samples of
collected images. Datasets used in this paper are summarized in Table 1.
12
Table 1: Datasets used in two prediction models
4. Experiments
13
(EVS) and coefficient of determination (R2 ), are used. They can be calculated
respectively as follows:
m
P
(|h(xi ) − xi |)
M AE(X, h) = i=1 , (6)
m
v
u m
u1 X
RM SE(X, h) = t (h(x(i) ) − y (i) )2 , (7)
m i=1
V ar {y − h(x)}
EV S(h, Y ) = 1 − , (8)
V ar {y}
SSres
R2 = 1 − (9)
SStot
where SSres is the difference between predictions and average value of predic-
tions, and SStot is the difference between labels and average value of labels.
Accuracy, receiver operating characteristic (ROC) curve and F1-score are
used to evaluate the quality of the model, where F1-score is defined as:
TP
F1 = (10)
T P + F N +F
2
P
TP
FP
14
1200
No collision
1000
0.9836 0.0163
800
True label
600
400
Collision
0.0710 0.9291
200
No collision Collision
Predicted
Figure 10: Heatmaps of collision prediction. Images from left to right show the environ-
ments of dynamic obstacles, a city road, a tree at night and a jungle.
15
4.5. Control commands processing
In this subsection, the deterministic arbitration methods mentioned in 3.4
are compared via 564 pairs of data taken from the collision datasets. To make
the results more convincing, the datasets contain two types of state changes, i.e.
from no-collision to collision and from collision to no-collision. From Figure 11, it
is found that the first-order Butterworth filtering has the highest response speed
when the state change occurs and is able to provide smooth control commands
when applied to the original data.
Figure 11: Smoothing and truncation of control commands using various filters.
The bottom figures show more details of the top figure at different states: close to obstacles
(top left), keeping a safe distance from obstacles (bottom left), leaving the obstacles and
returning to normal flight away from obstacles (bottom right).
16
4.7. Runtime analysis
To analyze the effects of the system, the time-consumption and varying de-
lays, including image and control transmission delays of three major components
(Image preprocessing, CNN processing and control mechanism) are considered.
The correlational analysis of time-consumption is presented in Figure 12. Ac-
cording to the measurement of time consumption, there is no significant dif-
ference whether to add the cost of image preprocessing (970fps) and control
mechanism parts. The consumption time of the proposed framework, as can be
seen from the table, is acceptable for automatic obstacle avoidance of UAV.
,PDJH#+] &RQWURO
[ #+]
:LUHOHVV
֨࠴૿ҵۓङدவԀ߶ொ
/$1 aPVGHOD\ aPVGHOD\
Figure 13: Obstacle avoidance is achieved by the UAV. In each subfigure, the image
in the lower right corner shows the environment of the UAV, while the image on the left is
taken by the front-facing camera of the UAV. (a) Avoidance of a pedestrian; (b)Avoidance of
a bicycle; (c)Avoidance of a car; (d)Avoidance of a tree at night.
17
Curve
Trod2 Road1
Trod1 Road2
Jungle
Figure 14: A map of the campus, with marked areas indicating testing environ-
ments. These areas are good representations of the urban environment. Pictures around the
map present some featured samples of these areas.
4.8.1. Outdoor
To evaluate the generalization performance, eight scenes including day, night,
outdoor and indoor that are not included in the training datasets have been cho-
sen as outdoor experimental environments. Figure 14 illustrates the environ-
ments of tests. For each sub-task, the performance of the system with grayscale
images is compared with the system with RGB images and Dronet.
Road test. With self-driving datasets used for training of the model, the UAV
has a reliable performance when tested on the scene Road1 and Road2. Three
experiments are carried out in each environment, and similar good experimental
results are obtained as summarized in Table 4. The average flight distances
of system with grayscale with 368m on Road1, and 154m or more on Road2,
indicate that the proposed automatic obstacle avoidance system has a robust
fitting ability for highway scenes and it can be easily readapted to a new road
that includes a straight line or a bend.
Complex trod test. The proposed system is tested in two complicated trod en-
vironments. There is no marked lane in such environments, which may affect
the decision of the network, so it is more challenging for navigation. Results
18
confirm that the UAV can fly autonomously in a distance, although the lawns
aside the trod have an inevitable impact on the obstacle avoidance system. Our
model with grayscale images achieved the best performance, as summarized in
Table 4.
Jungle flight test. For further testing, the generalization capabilities of the sys-
tem for complex scenarios are tested by flying the UAV in a jungle. For this
scenario, as the model does not have the same datasets for training, the UAV
using the other networks is unable to navigate to a target, but our network
with grayscale images found a relatively better route, with experimental results
recorded in Table 4.
Night test. The experiment at night is designed to verify the system’s capabil-
ity to deal with different lighting conditions. As shown in Table 4, the system
performs 45m flight after training with datasets in different lighting conditions
and has a considerable increase in capability against other systems. Further-
more, this is an advantage compared with the initial system, which achieved a
flight distance of nearly 25m. The better result may attribute to more train-
ing samples, which were collected from the experiments on the initial system
(experimental data of Road2 is excluded). The experiment shows that the per-
formance of the proposed system with grayscale images is comparable to the
other two mechanisms in outdoor environments. Indeed, in every environment
test, our system can respond to situational changes at any moment. A key
observation is the failure of collision avoidance to bushes based on the other
methods, as bushes give the model wrong information that the surrounding ob-
jects might be very close to UAV. However, our method can fly over the bushes
due to re-prediction on the top part of the image.
4.8.2. Indoor
The system with different forms of images is tested in an underground garage,
to evaluate the performance in indoor environments, shown in Figure 15. The
single most striking observation from the comparison was the experiment relying
on grayscale images performs slightly better than that on RGB images. This
discrepancy could be attributed to the delay caused by time consumption, i.e.,
processing an RGB image takes longer than processing the grayscale image in
transmission, image preprocessing, and network operation. According to the
comparison, we can infer that the RGB image is not necessary for automatic
obstacle avoidance, although it contains more environmental information than
grayscale images.
5. Conclusion
19
Start Start
Figure 15: An underground garage environment, in which the red curves represent
paths with different forms of images.
UAV collision datasets collected by flying the UAV manually. The first-order
Butterworth filtering is used to map the outputs of networks to UAV commands.
Extensive experiments using five outdoor/indoor scenarios prove that the system
proposed in this paper is efficient for UAV obstacle avoidance tasks due to its
low sensor requirement, light-weight network structure, strong learning ability
and environmental portability. The platform used for experiments relies on
WiFi, which imposes restrictions on communication distance between UAV and
ground control station. Therefore, an image/data transmitter of UAV based on
4G/5G mobile communication technology is being developed by our team, and
a platform with the transmitter will be investigated in our future work.
References
20
[7] T. W. Ubbens, D. C. Schuurman, Vision-based obstacle detec-
tion using a support vector machine, in: 2009 Canadian Confer-
ence on Electrical and Computer Engineering, 2009, pp. 459–462.
doi:10.1109/CCECE.2009.5090176.
[8] F. Espinosa, M. R. Jiménez, L. R. Cárdenas, J. C. Aponte, Dynamic ob-
stacle avoidance of a mobile robot through the use of machine vision al-
gorithms, in: Symposium of Signals, Images and Artificial Vision - 2013:
STSIVA - 2013, 2013, pp. 1–5. doi:10.1109/STSIVA.2013.6644903.
[9] N. Dinh Van, G. Kim, Fuzzy logic and deep steering control based recom-
mendation system for self-driving car, in: 2018 18th International Confer-
ence on Control, Automation and Systems (ICCAS), 2018, pp. 1107–1110.
[10] A. Giusti, J. Guzzi, D. C. Cireşan, F. He, J. P. Rodrı́guez, F. Fontana,
M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, L. M.
Gambardella, A machine learning approach to visual perception of for-
est trails for mobile robots, IEEE Robotics and Automation Letters 1 (2)
(2016) 661–667. doi:10.1109/LRA.2015.2509024.
[11] X. Yang, H. Luo, Y. Wu, Y. Gao, C. Liao, K.-T. Cheng, Reactive obstacle
avoidance of monocular quadrotors with online adapted depth prediction
network, Neurocomputing 325 (2019) 142–158.
[12] N. J. Sanket, C. D. Singh, K. Ganguly, C. Fermüller, Y. Aloimonos,
Gapflyt: Active vision based minimalist structure-less gap detection for
quadrotor flight, CoRR abs/1802.05330. arXiv:1802.05330.
URL http://arxiv.org/abs/1802.05330
21
[19] P. Chakravarty, K. Kelchtermans, T. Roussel, S. Wellens, L. V. Eycken,
CNN-based single image obstacle avoidance on a quadrotor, in: IEEE In-
ternational Conference on Robotics & Automation, IEEE, 2017, pp. 6369–
6374.
[20] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
M. Andreetto, H. Adam, Mobilenets: Efficient convolutional neural net-
works for mobile vision applications, CoRR abs/1704.04861.
22
[32] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network
training by reducing internal covariate shift, CoRR abs/1502.03167.
arXiv:1502.03167.
URL http://arxiv.org/abs/1502.03167
[33] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov,
Dropout: a simple way to prevent neural networks from overfitting, The
journal of machine learning research 15 (1) (2014) 1929–1958.
[34] W. Wang, Z. Pan, Dsnet for real-time driving scene semantic segmentation,
CoRR abs/1812.07049. arXiv:1812.07049.
URL http://arxiv.org/abs/1812.07049
[35] Z. Ren, D. Dong, H. Li, C. Chen, Self-paced prioritized curriculum learning
with coverage penalty in deep reinforcement learning, IEEE transactions
on neural networks and learning systems 29 (6) (2018) 2216–2226.
[36] S. Ruder, An overview of gradient descent optimization algorithms, CoRR
abs/1609.04747. arXiv:1609.04747.
URL http://arxiv.org/abs/1609.04747
[37] G. Bianchi, R. Sorrentino, Electronic filter simulation & design, McGraw
Hill Professional, 2007.
[38] Comma.ai, Public driving dataset, https://github.com/commaai/
research, accessed March 7, 2017.
[39] Udacity, Public driving dataset, https://www.udacity.com/
self-driving-car, accessed March 7, 2017.
[40] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, D. Batra,
Grad-cam: Why did you say that? visual explanations from deep networks
via gradient-based localization, CoRR abs/1610.02391. arXiv:1610.02391.
URL http://arxiv.org/abs/1610.02391
23