A Tree-Based Approach For Visible and Thermal Sens
A Tree-Based Approach For Visible and Thermal Sens
https://doi.org/10.1007/s00138-024-01546-y
RESEARCH
Received: 20 November 2023 / Revised: 14 March 2024 / Accepted: 17 April 2024 / Published online: 3 May 2024
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024
Abstract
Research on autonomous vehicles has been at a peak recently. One of the most researched aspects is the performance
degradation of sensors in harsh weather conditions such as rain, snow, fog, and hail. This work addresses this performance
degradation by fusing multiple sensor modalities inside the neural network used for detection. The proposed fusion method
removes the pre-process fusion stage. It directly produces detection boxes from numerous images. It reduces the computation
cost by providing detection and fusion simultaneously. By separating the network during the initial layers, the network can
easily be modified for new sensors. Intra-network fusion improves robustness to missing inputs and applies to all compatible
types of inputs while reducing the peak computing cost by using a valley-fill algorithm. Our experiments demonstrate
that adopting a parallel multimodal network to fuse thermal images in the network improves object detection during difficult
weather conditions such as harsh winters by up to 5% mAP while reducing dataset bias during complicated weather conditions.
It also happens with around 50% fewer parameters than late-fusion approaches, which duplicate the whole network instead
of the first section of the feature extractor.
Keywords Pedestrian detection · Deep learning · Machine learning · Road vehicle identification · Winter · ADAS
123
However, thermal imaging has multiple drawbacks. Work- This research uses a novel time synchronization technique
ing with temperature instead of a light ray makes it unable to that is both low latency and resilient to sensor failures. The
view through Lambertian surfaces [25]. Another disadvan- proposed fusion network uses an approach based on MLF
tage of thermal imaging is that being a costly sensor makes fusion that avoids the intermediary state of fusion to improve
data available for training scarce. The imbalance between the results. It combines the fusion and detection networks into
available visible and thermal images must be counteracted a low-cost network adapted to edge devices.
with a process allowing training with missing data without A summary of the contribution of this article is presented
losing on the large-scale datasets of the visible spectrum. below.
Fusion is a widely researched topic in the literature, and
several fusion methods have been proposed. Three main 1. Network level fusion strategy of multimodal image data is
approaches exist: High-level Fusion (HLF), Mid-level Fusion proposed to improve the overall performance of computer
(MLF), and Low-level Fusion (LLF). High-level fusion is vision-based detection methods for self-driving vehicles
fusion post-detection [26]. HLF cannot use cross-channel in challenging environmental conditions.
information and generally results in lower classification task 2. Novel synchronization method that provides resilience to
performance in harsh conditions [26]. Low-level fusion is sensor failures, low latency, and real-time synchroniza-
the fusion of the raw input of the sensors. While it allows tion.
for all information to improve the task’s performance, it also 3. End-to-end learning from the data fusion to object detec-
requires precise extrinsic calibration and is the least tolerant tion for various deep-learning models is adopted (fusion
to the distance between sensors [26]. And lastly, MLF is an & detection) for better performance and reduced opera-
abstraction between low-level and high-level fusion. It fuses tional complexity.
some relevant subset of the information from a statistical 4. The proposed fusion strategy accommodates imbalanced
point of view [27]. MLF requires extensive training to extract datasets with minimal bias and, therefore, handles the
those relevant features. Current approaches to MLF are inde- data scarcity problem associated with specialized imaging
pendent of the detection phase and tend to fuse raw input into modalities.
features [28]. Current MLF approaches need to be revised 5. A fully automated network splitting optimizer is proposed
for level 4 and 5 autonomy. However, those MLF methods to realize network-level fusion.
bring many advantages, such as fidelity to spatial details, low
computational complexity, and robustness to imprecise cali- The rest of this paper is organized as follows. Firstly,
bration. Following the MLF approach is the best option for Sect. 2 presents related works, Sect. 3 discusses the proposed
the current work. The proposed network uses fusion inside method of modularity instead of manual fusion, and Sect. 4
the network to accommodate lower processing cost and high offers the proof of concept for the multimodal input. Finally,
performance. Sect. 5 concludes this paper.
Fusion at the input or in the network, like LLF and MLF
methods, requires sensors to be calibrated [26]. Two types
of calibration are needed. The first type is spatial calibra- 2 Related works
tion [29], which requires knowing the distance and rotation
between each sensor. The second type of temporal calibration 2.1 Fusion
is synchronizing the inputs in a compatible way [30]. The
primary approach to temporal synchronization is dynamic 2.1.1 CNN based
programming to minimize the time between the sensors’
captures, hence reducing latency from the fusion. However, Image fusion has multiple families of approaches. The first
that approach can’t be performed in real-time without added family is the pan sharpening method [8]. This method allows
latency, and the primary technique for a real-time process is the fusion of images of different resolutions and spec-
simply waiting for all inputs to be obtained. These methods trums, like satellite imaging. One of the best-performing
require that all information be present, making the synchro- pan-sharpening methods is the convolutional auto-encoder,
nizing approach vulnerable to missing sensor reading [31]. which improves the quality of low-resolution images to
During real-time execution, sensors do not adhere to a perfect be fused with the high-quality ones after processing [32,
frequency. Factors such as bandwidth or simply imprecise 33]. Component Substitution (CS) [34–37] is also a pop-
sensor clocks cause variations in the frequency of the sen- ular fusion algorithm that converts the original inputs into
sor capture. Thermal sensors are known to have a lower another set of features combining the originals [8]. Most
frequency than visible sensors, and a well-designed fusion current approach converts RGB + Thermal inputs into a
algorithm can handle that difference in sensor order in tan- transformed RGB image containing Thermal inside the three
dem with the variation in sensor frequency. channels. Many of those approaches utilize deep-learning-
123
based networks to fuse the image using an encoder-decoder are also too slow for real-time applications such as self-
architecture [8]. Modern techniques such as auto-encoders driving [45].
work on the encoder-decoder architecture and have been The SSD [45] method introduced the first deep network-
applied to CS fusion [32, 38]. However, the main problem of based object detector that didn’t resample pixels or fea-
those independent pre-network fusions is the additional cost tures for classification. That method works by having
of the fusion network, which is independent of the detection. a convolutional-based classifier at a few steps through
Current state-of-the-art approaches include but are not lim- the network to allow different aspect ratios and scale
ited to the DenseFuse [39], belonging to the family of CNN objects. Removing the independent classification network
deep learning. has achieved real-time detection with a slight drop in mAP
to 25.1%.
2.1.2 GAN and auto-encoder based
Two other families of deep learning exist: generative adver- 3 The proposed model
sarial networks (GAN) and auto-encoders (AE) [38]. GAN-
based fusion [40–42] generates a new image from the feature This section describes the proposed method of creating
vector representing it, while auto-encoders do component branches and how to size them appropriately. It also explains
substitution of an image to reproduce an output of the same how to modify existing networks without creating a new
dimension. Auto-encoders are mainly used to denoise the one and how to select the fusion point. Next, it presents
input and improve its quality in a whole, semi-supervised the required input preparation and the implementation of the
way, as their output should be their input. To apply auto- proposed method. Finally, it provides the necessary sync-
encoders to images, they must be convolutional; as such, ing process and the training protocol. The proposed method
auto-encoders are encoder-decoder CNNs in which the out- aims to improve the detection performance of road elements,
put size is the same as the input [38]. Auto-encoders cannot including pedestrians, vehicles, traffic signs, lights, motorcy-
produce a new image and can only regenerate the input or clists, and cyclists, particularly under challenging conditions
a variation of it when denoising [38]. In [38], they intro- like winter, nighttime, or rain while reducing the processing
duce variational auto-encoders, generating hidden features cost and enhancing the robustness for autonomous vehicles.
using a Gaussian distribution following the equivalent-sized
layer in the encoder network. The generative aspect of that 3.1 Proposed branches approach
method allows for generating higher quality fused images
than the other deep-learning-based methods. These SOTA Inspired by SSD [45] and YOLO [46–49], which com-
auto-encoder methods constitute one of the many basic archi- bine region proposal and object detection, the proposed
tectures the proposed method uses and introduce them as fusion method combines the fusion network and the detec-
backbones. tion network into an end-to-end fusion detector. The need
for intermediary representation or transformation is removed
2.2 Detectors by having intra-network fusion. Consequently, the need for
manual labeling of the fusion is eliminated by removing the
Fusion alone does not provide object detection capabilities; it intermediary model. However, this implies that the fusion
only generates a combined image from inputs. Object detec- cannot be visualized. A fusion layer must be included in the
tion must be performed on the fusion output using dedicated network to enable the network to learn data fusion. Splitting
object detectors. Many state-of-the-art networks have been the network into branches to treat part of the information
implemented in multiple frameworks such as Pytorch [43] independently is proposed in this paper. At a level decided
and offer plug-and-play pre-trained networks to work in ideal at the modeling step, these branches will merge. Modality-
weather. When using that detection on other weather situa- specific feature extraction in the first layers of the network is
tions or for detecting other objects, fine-tuning the network induced by them while the abstraction in deep layers is kept.
is encouraged as many patterns could be reused from the Computational cost to the network is added by each of those
existing training [24]. branches.
Classic networks like the Faster-RCNN [44] were based Using the branching approach lowers the processing cost
on a typical shared process. Those networks hypothesized compared to HLF fusion networks. The cost of branching a
bounding boxes, resampled the content of those boxes, and network is the following:
applied a classifier on the resulting sub-image. This pipeline
has been the de facto way for all networks. Still, it has the q
k
K
inconvenience of being computationally intensive, making C F O (k) = F Oic + F Oi (1)
their use restricted to high-end devices, and those methods i=1 c=1 i=k+1
123
where F Oic is the number of operations for layer i for chan- Table 1 Statistics on the weight percentage of each channel in pre-
nel c, C F O (·) is the cost in floating operations, F Oi is the trained networks from Pytorch [43]
number of operations for layer i, k is the layer before where Model Red Green Blue
we merge, K is the total count of layers, and q is the number
fasterrcnn_mnet_v3_lg_fpn 28.83 56.11 15.05
of channels.
fasterrcnn_resnet50_fpn 34.04 38.21 27.74
The cost of evaluation of a convolutional layer F Oi is
defined as: ssd300_vgg16 [45] 29.70 36.58 33.70
ssdlite320_mnet_v3_large 33.95 46.43 19.61
F Oi = di × wi × h i × cxi × c yi × di−1 (2) resnet50 [50] 34.04 38.21 27.74
alexnet [51] 35.56 35.73 28.70
where wi and h i are the sizes of the feature map at layer i, densenet121 [52] 34.14 40.36 25.49
cxi and c yi are size of the convolutional filter, and di is the efficientnet_b0 [53] 30.74 45.20 24.04
number of features. inception_v3 [54] 39.56 31.09 29.34
Effectively, branches can be split with different sizes, mnasnet0_5 [55] 31.16 46.49 22.34
which, when sizing is adequately done, will reduce the bias regnet_x_16gf 32.91 42.81 24.27
towards data in high quantity by providing neurons propor- shufflenet_v2_x0_5 33.09 45.22 21.67
tional to the desired importance. Proper sizing of the branches vgg11 [56] 34.59 37.33 28.07
can further reduce the cost of processing by eliminating the Average 33.13 41.79 25.06
need for extra neuron calculation.
A simplified equation can be used if each branch has the
same amount of features, which is defined by: forced. The network is forced to consider the inputs, no mat-
ter the ratio of that data in the information, by the branch size,
k
K
which is crucial. For similar accuracy, the network size can
C F O (k) = (n × F Oi ) + F Oi (3)
be adapted to a lower calculation point than the vast network.
i=1 i=k+1
The output of those branches must be on a compatible spatial
where n is the number of channels to merge. size to be concatenated, but there is no restriction on the size
of the branch. The exact spatial resolution is needed to con-
catenate the branches’ output, but they can have any number
3.2 Proper branches sizing
of features. To alleviate the additional number of floating-
point operations caused by the branches, proper sizing must
Neural networks tend to split the weight across all the inputs
be provided. Higher accuracy in smaller-sized datasets has
almost evenly, as shown in Table 1, which shows the weights
been proven to be achieved by smaller NNs, as they need
applied to each input channel. The red channel averages
fewer images to be trained [57]. According to the desired
33.13% of the weights of the pre-trained networks, while
problem, dataset size, and dataset quality, the proper size for
the green and blue channels go with 41.79% and 25.06%. A
each branch needs to be defined. The inverse proportion of the
slight bias is shown towards the green channel, which is more
image count for each branch would be the proposed branch
determinant than blue in an image. However, that value is still
size. To remove the bias toward data, the proportion of the
in the range to statistically follow the tendency to separate
dataset multiplied by the ratio of neurons in the output of
the weight equally across inputs. Using the distribution of
the branch should be similar. The reduction of the branches’
weights mentioned above, adding modality such as thermal
size, reducing the cost, would be allowed by it. However,
to a color image would be around 25% of the weights on that
it should be noted that depending on the data size, branch
modality. However, due to local maximums of the weight in
size can be adjusted, and in the worst case(scarce dataset),
the network, there is a possibility this fourth channel might
overfitting may occur in the corresponding data branch only.
tend to have a deficient proportion. Additional information
would not follow this distribution using a pre-trained net-
work that has already attained maximum. This misintegration 3.3 Modifications to existing networks
of the additional data would not affect the result enough to
increase global accuracy. Multiple solutions exist to this mis- To apply the current approach, networks must receive min-
integration of information. As each branch is independent of imal changes. A state-of-the-art backbone network, such as
the others, it is possible to partially train the network or reuse Alexnet [51] or VGG-16 [56], is first chosen as the base net-
branches. work. The first modification to be applied to the network is
By adjusting the size of branches, the desired proportion of duplicating the first k layers, as seen in Fig. 1. For apply-
neurons and filters associated with a specific modality can be ing the weights, duplication doesn’t alter the size, making
123
weights compatible except for the fusion and first layers. 3.4 Evaluating split layer
The first layer will have an input size of 1 feature per branch.
The fusion layer at k has n times the features where n is To propose a value for k, the network must first be trained
the number of branches. For that layer to receive all the with k = K , meaning at the end. Once the data from the
required inputs, the output of all branches is concatenated. training has been acquired, there are two options. In the first
However, the convolutional size and the number of filters will option, a layer visualizer is used to display the weights of
be changed to specialize the network’s branches to prevent each layer to a neural network expert. That expert identifies
reusing existing kernels. Each branch being independent, the layer number with the most similar weights and uses this
they can be processed on different processing units or spread value as k.
out using a valley-fill algorithm to be run during downtimes The second automatic option approximates the expert
between captures. decision based on entropy. There are four steps needed to
The choice of k impacts the detection performance and extract the information from the previous training. In the first
processing cost. If k is chosen to be close to the beginning step, the computer evaluates three statistics for each channel
of the network, it acts as an LLF. The main problem with of all layers. The first statistic is the entropy calculated with:
such approaches is the need for modularity. In contrast, a k
value next to the network’s end acts as an HLF, significantly
increasing the processing cost. This method’s drawbacks are
N
n=1 logsoftmax(X )n × elogsoftmax(X )n
H (X ) = − (4)
that the whole network must be computed even on channels N
with no information, such as a green channel, when detecting
a stop sign. It is crucial to take into consideration both cal- where N is the size of the sample, X the sampled value and
culation cost and meaningful information. Neural networks n an iterator for the samples. The second statistic is the mean
work by abstracting more details at each level until detec- calculated with:
tion. For example, an array of colorful pixels in the input can N
later become a square’s abstraction. This abstraction is the n Xn
X̄ = (5)
best point to split the network, as each channel will abstract N
similar information for the same object. However, input dis-
And finally, the variance calculated with:
tributions will vary.
The abstraction point of a NN cannot be readily deter- 2
mined. Each network architecture and each dataset will imply V ar (X ) = E X − X̄ (6)
a different level of abstraction for the convolutional layers.
Another drawback of not knowing the abstraction layer is that where E is the symbol for expected value. The difference
channel-specific patterns may be abstracted after the fusion between those values for each layer is then computed:
layer, reducing efficiency for all channels if the fusion is
achieved too soon in the network. Merging at the end of the
C
network or having incorrectly sized features in the branches Si − Si−1 (7)
will result in higher computation and require a more consid- i=2
erable dataset extent.
123
where Si is one of the aforementioned statistics. Then, objects exhibit negligible temporal displacement. The tol-
join the three statistics into a single one with a weighted sum erance in positional change should be constrained by the
such as Minkowski: convolutional window size. Therefore, the parameters for
fine-tuning and inference should be nearly identical.
M I N K O W S K I
3.6 Proposed implementation
= p
ωx × |X 2 − X 1 | p + ω y × |Y2 − Y1 | p + ωz × |Z 2 − Z 1 | p (8)
123
forward layers, making them adapted for network fusion. The Table 2 Different syncing methods
encoder-based network consists of five layers with a kernel Method Variant Cost Latency Missing data
of size three, followed by five convolutions with padding of
two. The Single Shot Detector assumes the detection part of Wait (A) Parallel High Low Failure
the network (SSD) [45] as it can work with any backbone. Queue Low High Failure
First come – – Failure
Fixed Freq (B) Parallel High Moderate Unaffected
3.7 Proposed syncing method
Queue Low High Unaffected
Inherently merging layers in the network allows parallel or First come Lowa Moderate Unaffected
partial calculation of the network, enabling separate pro- Wait + Parallel High Moderate Latency
cessing of the sensors with different frequencies possible, Queue Low High Latency
uniformizing the calculation burden. The uniformization is Max delay (C) First come Lowa Low Latency
done according to a valley-fill algorithm. Valley-fill algo- a May be moderate
rithms distribute the processing over a period where the
calculation cost would have been lower to average it. Valley-
fill algorithms aim to spread out and reduce computation Figure 2b shows a theoretical event where the FLIR cam-
peeks [59, 60]. Calculating layers 1 to k for each input with era stopped working.
inputs at varying frequencies is possible. Henceforth, a sync- The first method is to wait for all data to be available
ing technique must be used [61]. before the processing starts. As in Fig. 2a timeline A during
As shown in Fig. 2, a regular syncing method that acts an optimal case, the latency is short, but this first method fails
before the network introduces lots of latency and processes all if a sensor failure happens as in Fig. 2b:
data simultaneously, thus requiring better computing equip-
ment to keep both the latency and frequency at real-time • Its first variant is to process all data simultaneously once
levels. everything arrives.
Several possibilities are studied to reduce the processing • The second variant is to process each network branch
cost, mainly three methods, each with three variants. The one at a time but still wait for all the info. This method
Table 2 shows the specifics of each technique. reduces the peak cost at the expense of latency.
Temporal calibration of sensors requires calibration meth- • The third variant isn’t available for this method as we
ods that are robust to missing data and have minimal latency. cannot process the information as it comes and still wait
Figure 2 illustrates the proposed methods with a visual for all information to be available.
representation. The purple circle signifies the input from
the FLIR camera. The blue circle represents the input from The second method aims to resolve the sensor failure issue
the Mynteye P camera. The green line denotes the latency using a fixed frequency for data processing, as depicted in
between the incoming data and the processing, represented the B timeline of Fig. 2a. With a fixed processing interval,
by the yellow triangle. The crossed-out circles indicate data missing a sensor will not impact the system since the period
that has been disregarded, as more recent data on the same between data processing remains constant:
channel was received before processing. Using the most
recent picture helps reduce the time difference between cap- • This first variant of this method aims to moderate the
ture and processing. latency as seen in Fig. 2a. Still, the fixed frequency might
123
create false positives for sensor failures and induce lags that would turn off the missing sensor after a few iterations
if it needs to be adequately timed with the sensors. It also to reduce that lag and automatically reenable it if it sends
has a high processing cost. new data to fix the potential latency of the last option. This
• Through the second variant, all the data is processed one step is present in the sync + concat step of the network as
at a time once the frequency has been reached, creating seen in Fig. 1.
a high latency, but the cost stays low.
• The third method processes incoming information imme- 3.8 Data preparation and collection
diately, and only the shared part of the network is
processed at a fixed frequency. However, this first-come, Due to the absence of public or accessible datasets for
first-processed approach may lead to redundant process- self-driving vehicles that contain both thermal and visible
ing for a sensor. If a sensor’s frequency allows it to arrive spectrum images, normal driving datasets that include both
twice before the fixed frequency trigger, two processing spectrums were used. The public FLIR ADAS dataset, which
events will occur, even if only the most recent output is consists of 5322 pairs of visible and thermal images that are
used. synchronized and labeled, was used for night, fog and rain
conditions. As no public datasets were available for winter
The current paper proposes a third method to address conditions, our own data were collected. A Mynteye S1030
moderate latency and double processing issues while fixing camera and a FLIR ADK were mounted on a Kia Soul EV
sensor failures. This third synchronization method offers the 2017 and driven around the UQTR campus and its vicinity
advantages of both previous ways. Instead of using a fixed for several hours. Data were recorded under various weather
frequency, this method waits for the information like the first. conditions in winter, and then one image every 5 min from
However, a max delay starts calculating once the first sensor the recording was extracted, resulting in 256 winter-based
is read. The information is processed if this delay is reached pairs. Optimal offline synchronization was applied to obtain
before retrieving data from all the sensors. In Fig. 2a, this the best matching pair of thermal and visible images. An
method is represented as the C timeline, in which the latency example of the dataset image is shown in Fig. 3, where (a) is
lines are to their maximum duration: the visible spectrum, (b) is the original uncalibrated thermal
image, (c) is the loosely calibrated thermal image, and (d)
• The first variation processes all information after the trig- and (e) are the thermal images overlayed with 40% opacity
ger is activated, resulting in parallel processing and high on the visible image. Figure 3b demonstrates the importance
cost. However, latency can be moderate in the worst case of thermal imaging, as a pedestrian who is obscured by the
and low in the best case since the delay is a fixed fre- building shadow in the visible spectrum is only visible in the
quency if a sensor is missing. thermal image.
• The second variant queues each branch once the trigger Figure 4 shows six other pairs of thermal and visible
is activated. Like the other methods, the cost becomes images that illustrate various scenarios relevant to self-
low and the latency high. driving vehicles. In these images, road objects that are more
• For the third variant, each branch is processed when data than 15 m away are aligned correctly, while off-road objects
are available, reducing the peak cost and the early exit and tall elements, such as lampposts, are displaced.
induced if all sensors have data available, reducing the
latency to a minimum. The negative aspect of this method 3.9 Training
is that a missing sensor causes latency.
A special training procedure was required by this novel archi-
All syncing methods and their variants are presented in tecture. The original weights from pretraining on datasets
Table 2. It is important to note that each branch is separated such as MS-COCO were used for the modified state-of-the-
from the previous methods. Except for the first variant, in art networks, and then a three-step training on fusion was
which all data is after the firing of the syncing method, only applied. In the first step, the network was trained on the fusion
the trunk would be processed after the firing of the syncing dataset, which consisted of synchronized images from all
procedure. It should be attached to a learning mechanism
123
Fig. 4 Image/thermal pairs for the 256 image self-collected fusion dataset
123
123
This method also has a high average latency compared to the cially since sensors are often exposed to harsh environmental
first one. Finally, the proposed syncing process provided low conditions, like snowstorms, where the risk of sensor failure
latency and high fusion frequency during an optimal stage; is significantly higher.
it didn’t waste any processing on failed runs nor ignored lots In Fig. 6, the red, green, and blue curves represent the pro-
of readings while only slowing for missing sensors. The pro- cessing of the color image. The purple curve represents the
posed method is a more robust algorithm for processing than processing of the thermal image. The red vertical line repre-
the current widely used methods. The increased robustness sents the capture of the RGB image, and the purple vertical
demonstrated by this approach is particularly advantageous line represents the capture of the thermal image. The pro-
for autonomous vehicles. It ensures better reliability, espe- cessing cost could be lowered from 8 GFOP to 4 GFOP, as
123
visible in Fig. 6, using the selected syncing algorithm and a latency was used, and the cost was reduced by half. The
valley-fill algorithm. This figure shows the valley-fill algo- trunk (layers k + 1 to K ) is unaffected by both methods. By
rithm perfectly to present the best outcome. However, the reducing the peak computational demand, the use of energy-
processing cost will be slightly higher depending on which efficient, low-power edge AI computers becomes feasible,
implementation is chosen [59, 60]. The latency won’t be which can reduce the vehicle energy consumption.
affected in normal conditions as the sensors are desynchro- A comparison of the proposed end-to-end fusion-to-
nized. In Fig. 6b, by using the wasted latency between the detection network with different calibration qualities is
capture of the visible and the thermal image, no additional presented in Fig. 7. The network is trained on 1000 images
123
from the Flir ADAS dataset and evaluated on another 1000 and 30.1% for the FLIR ADAS dataset (Fig. 8). SSD is known
images. To simulate improper geometric calibration due to in the literature to provide lower mAP than other state-of-the-
large sensor distances, reverse homographic transformations art methods; however, it gives real-time results suitable for
are applied on the Flir ADAS dataset. The results show that edge devices. This very low mAP indicates that SSD, unlike
imprecise calibration achieves a similar level of mAP as per- other state-of-the-art methods, is more sensitive to weather
fect calibration, while allowing for more flexibility in sensor variations. The FasterRCNN detector performs slower and
placement. introduces latency, which could be fatal in the case of self-
The proposed end-to-end fusion-to-detection network is driving, but provides 38.4% on the FLIR dataset, 43.8%
then compared with combinations of state-of-the-art fusion on PST-900, and 34.8% on the winter dataset (Fig. 8). For
and state-of-the-art detectors to offer a fair comparison with YoloV5, the mAP reaches 39.4% using the DenseFuse fusion
state-of-the-art methods. The novel architecture has been and 23.6% on the winter dataset. YoloV8, being more pre-
trained using a 256 entries self-collected fusion dataset com- cise than YOLOv5 gets 41.36% on the FLIR dataset, 36.6%
bined with the MS COCO 2017 [62] and the FLIR ADAS on PST-900, and 24.21% on the winter dataset (Fig. 8). The
dataset [63]. end-to-end network using auto-encoders or VGG as back-
Two state-of-the-art fusions are presented-firstly, the bones performs better than state-of-the-art detectors in all
DenseFuse [39] fusion, which was initially trained on the conditions by using the thermal information without the inter-
FLIR ADAS dataset [63]. Secondly, the variational convolu- mediary image. The fusion of thermal and visible images
tional auto-encoder [38] is used for its compelling results improves the mAP, as this fusion technique effectively detects
with minimal dataset size. The generative aspect of that objects obscured in shadows or otherwise imperceptible in
method allows for generating higher-quality fused images the visible light spectrum.
compared to other deep-learning-based approaches. Since
the proposed approach provides both fusion and detection
without providing the intermediate image, it is impossible 5 Conclusion
to compare with fusion methods alone. Three state-of-
the-art detectors have been used. The Faster-RCNN [44] This paper proposes an end-to-end fusion network with syn-
with the Resnet50 [50] backbones are legacy detectors. chronization for detecting road obstacles using the fusion
The SSD300 [45] provides a recent network with com- of thermal and visible images. Following SSD’s path, our
pelling aspects such as real-time and high mAP. Finally, method integrates two sequential networks into an end-to-end
the YoloV8 [47, 48] is used for a modern implementation one to reduce processing costs and increase detection perfor-
with high performance and low latency. The detectors under mance. The fusion is applied by a branch and trunk approach
examination were equipped with pre-trained models on the with spectral length-based branches and adaptive weights at
Common Objects in Context (COCO) dataset. Fine-tuning the fusion layer. The proposed method also suggests using
was performed for 100 epochs on the new datasets to assess a maximum delay syncing method that adds robustness for
the performance of these detectors on the fusion datasets. missing sensors while keeping the latency low. The modu-
The proposed entropy-driven choice of k was used to com- lar branch and trunk approach allows for input adaptation
pare the state of the art introduced at this section’s beginning. without retraining. By using the network’s modularity, par-
The state-of-the-art fusion and detector combinations results tial training can reduce the bias created by dataset size during
are then compared to the proposed entropy-based method in training. The proposed way of splitting the network allows
Fig. 8. The figure presents the network evaluated the FLIR for automatic branching for detection networks without any
dataset in Fig. 8a, the winter based 256 images dataset pro- fast forward in the tensors. The fusion point is automatically
vided by our team in Fig. 8b, and the PST-900 [64] dataset selected by the proposed entropy-based method. The pro-
in Fig. 8c. The harsh winter-based dataset is significantly posed network achieved 40.5% mAP on the challenging 256
more challenging to detect objects due to snow and night. image-pair heavily degraded winter dataset captured by our
This 256-image dataset annotated by our team contains snow team and 44.3% on the public FLIR ADAS dataset, which
storms and other severe data, which are difficult to analyze contains slightly degraded conditions, using thermal images
by neural networks. The PST-900 dataset is taken in subter- combined with visible spectrum images. It should be noted
ranean conditions for survivors’ rescue. The PST900 dataset that mAP of 44.3% by the proposed method is a significant
is often taken in darkness, where the thermal image contains performance jump in the aforementioned adverse condition,
the primary information, unlike winter driving, where car and including daytime thermal images in the dataset has
lights make the visible image less challenging to understand. shown a drastic improvement in the mAP reported by SOTA
DenseFuse performs slightly better than the VCAE method methods, as in the FLIR ADAS dataset. This paper proposes
by half a percent. The SSD300 method using a state-of-the- a novel end-to-end fusion approach for thermal and visible
art fusion performs the worse at 15.8% in winter conditions imaging to improve autonomous driving performance during
123
harsh weather like winter. It also offers a novel synchroniza- 4. Boisclair, J., Kelouwani, S., Ayevide, F.K., Amamou, A., Alam,
tion method for real-time low-latency synchronization that M.Z., Agbossou, K.: Attention transfer from human to neural net-
works for road object detection in winter. IET Image Proc. (2022).
is vital in autonomous driving. That novel synchronization https://doi.org/10.1049/ipr2.12562
method is resilient to sensor failure, such as camera discon- 5. Zhao, Z., Zheng, P., Xu, S., Wu, X.: Object detection with
necting during vehicle operation. Our method enhances the deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst.
robustness, efficiency, and performance of the detection sys- 30(11), 3212–3232 (2019). https://doi.org/10.1109/TNNLS.2018.
2876865
tem. It uses low-power edge AI computers that can handle 6. Ji, K., Lei, W., Zhang, W.: A deep retinex network for underwater
harsh weather conditions and save battery. It also detects low-light image enhancement. Mach. Vis. Appl. 34(6), 122 (2023)
objects that are otherwise invisible in the visible spectrum. 7. Malik, M., Majumder, S.: An integrated computer vision based
The proposed architecture can be easily applied with addi- approach for driving assistance to enhance visibility in all weather
conditions. In: International and National Conference on Machines
tional spectral length, such as ultraviolet, by adding branches
and Mechanisms
and data while keeping the weights of existing branches. 8. Ghamisi, P., Rasti, B., Yokoya, N., Wang, Q., Hofle, B., Bruzzone,
The architecture also applies to other tasks using neural net- L., Bovolo, F., Chi, M., Anders, K., Gloaguen, R., Atkinson, P.M.,
works with inputs of compatible types and a convolutional Benediktsson, J.A.: Multisource and multitemporal data fusion in
remote sensing: a comprehensive review of the state of the art.
layer acting as the fusion layer. While the proposed network
IEEE Geosci. Remote Sens. Mag. 7(1), 6–39 (2019). https://doi.
achieves a satisfactory mAP for self-driving in favorable org/10.1109/MGRS.2018.2890023
weather conditions, it struggles in adverse conditions such 9. Du, H., Hao, X., Ye, Y., He, L., Guo, J.: A camera style-invariant
as winter. In winter, the network surpasses the state-of-the- learning and channel interaction enhancement fusion network for
visible-infrared person re-identification. Mach. Vis. Appl. 34(6),
art methods that use conventional convolutional architectures 117 (2023)
and shows the feasibility of using attention mechanisms for 10. Watt, N., Plessis, M.C.: Neuro-augmented vision for evolutionary
detection tasks. It also demonstrates the potential of low- robotics. Mach. Vis. Appl. 34(6), 95 (2023)
computation and high-robustness detection systems. Future 11. Coenen, M., Schack, T., Beyer, D., Heipke, C., Haist, M.: Con-
sinstancy: learning instance representations for semi-supervised
work will explore how to enhance the detector’s mAP in panoptic segmentation of concrete aggregate particles. Mach. Vis.
challenging weather conditions, such as winter scenarios, by Appl. 33(4), 57 (2022)
employing Transformers as the backbone of the architec- 12. Singha, A., Bhowmik, M.K.: Tu-vdn: Tripura university video
ture, by applying the proposed architecture to LiDAR-based dataset at night time in degraded atmospheric outdoor conditions
for moving object detection. In: 2019 IEEE International Confer-
detection networks, and by fusing a state-of-the-art detection ence on Image Processing of the ICIP, pp. 2936–2940. IEEE
network with a point cloud-based network. 13. Liu, Q., Lu, X., He, Z., Zhang, C., Chen, W.-S.: Deep convolutional
neural networks for thermal infrared object tracking. Knowl. Based
Acknowledgements This research was funded by the Natural Sciences Syst. 134, 189–198 (2017)
and Engineering Research Council of Canada and the Canada Research 14. Jonsson, P.: Remote sensor for winter road surface status detection.
Chair Program. In: 2011 IEEE SENSORS, pp. 1285–1288. IEEE
15. Light, J., Parthasarathy, S., McIver, W.: Monitoring winter ice con-
Author Contributions J.B. wrote the main manuscript A.A., M.Z.A. ditions using thermal imaging cameras equipped with infrared
S.K, K.A provided research supervision J.B, M.O, L.Z. contributed to microbolometer sensors. Procedia Comput. Sci. 10, 1158–1165
the code and experiments S.K and K.A worked for funding aquisition (2012)
All authors reviewed the manuscript. 16. Fetzer, G.J., Sitter, D.N., Jr., Gugler, D., Ryder, W.L., Griffis, A.J.,
Miller, D., Gelbart, A., Bybee-Driscoll, S.: Ultraviolet, Infrared,
Data availability No datasets were generated or analysed during the and Near-infrared Lidar System and Method (2010)
current study. 17. Shopovska, I., Jovanov, L., Philips, W.: Deep visible and thermal
image fusion for enhanced pedestrian visibility. Sensors (2019).
Declarations https://doi.org/10.3390/s19173727
18. Chebrolu, K.N.R., Kumar, P.N.: Deep learning based pedestrian
detection at all light conditions. In: Proceedings of the 2019 IEEE
Conflict of interest The authors declare no conflict of interest.
International Conference on Communication and Signal Process-
ing, ICCSP 2019, pp. 838–842. https://doi.org/10.1109/ICCSP.
2019.8698101
19. Bercier, E., Louvat, B., Harant, O., Balit, E., Bouvattier, J., Nacsa,
References L.: Far-infrared thermal camera: an effortless solution for improv-
ing adas detection robustness. In: Proceedings of SPIE—The
1. Nguyen, V.N., Jenssen, R., Roverso, D.: Ls-net: fast single-shot International Society for Optical Engineering, vol. 11009. https://
line-segment detector. Mach. Vis. Appl. (2021). https://doi.org/10. doi.org/10.1117/12.2520364
1007/s00138-020-01138-6 20. Hwang, S., Park, J., Kim, N., Choi, Y., So Kweon, I.: Multispectral
2. Murthy, C.B., Hashmi, M.F., Keskar, A.G.: Efficientlitedet: a real- pedestrian detection: benchmark dataset and baseline. In: Proceed-
time pedestrian and vehicle detection algorithm. Mach. Vis. Appl. ings of the IEEE Conference on Computer Vision and Pattern
33(3), 47 (2022) Recognition, pp. 1037–1045
3. Yao, J., Huang, B., Yang, S., Xiang, X., Lu, Z.: Traffic sign detection 21. Yang, R., Zhu, Y., Wang, X., Li, C., Tang, J.: Learning target-
and recognition under low illumination. Mach. Vis. Appl. 34(5), oriented dual attention for robust rgb-t tracking. In: 2019 IEEE
75 (2023)
123
International Conference on Image Processing of the ICIP, pp. 41. Ahmad, K., Pogorelov, K., Riegler, M., Conci, N., Halvorsen, P.:
3975–3979. IEEE Cnn and gan based satellite and social media data fusion for disaster
22. Li, H., Wu, X.: Densefuse: a fusion approach to infrared and visi- detection. In: MediaEval (2017)
ble images. IEEE Trans. Image Process. 28(5), 2614–2623 (2019). 42. Wang, C., Yang, G., Papanastasiou, G., Tsaftaris, S.A., Newby,
https://doi.org/10.1109/TIP.2018.2887342 D.E., Gray, C., Macnaught, G., MacGillivray, T.J.: Dicyc: Gan-
23. Huangfu, Y., Campbell, L., Habibi, S.: Temperature effect on ther- based deformation invariant cross-domain information fusion for
mal imaging and deep learning detection models. In: 2022 IEEE medical image synthesis. Inf. Fusion 67, 147–160 (2021)
Transportation Electrification Conference & Expo (ITEC), pp. 43. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan,
185–189. IEEE (2022) G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison,
24. Erhan, D., Courville, A., Bengio, Y., Vincent, P.: Why does unsu- A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chil-
pervised pre-training help deep learning? In: Proceedings of the amkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: Pytorch:
Thirteenth International Conference on Artificial Intelligence and an imperative style, high-performance deep learning library. In:
Statistics, pp. 201–208. JMLR Workshop and Conference Proceed- Advances in Neural Information Processing Systems, vol. 32, pp.
ings (2010) 8024–8035. Curran Associates, Inc. (2019)
25. Tu, L., Qin, Z., Yang, L., Wang, F., Geng, J., Zhao, S.: Identifying 44. Zhao, X., Li, W., Zhang, Y., Gulliver, T.A., Chang, S., Feng, Z.: A
the Lambertian property of ground surfaces in the thermal infrared faster rcnn-based pedestrian detection system. In: 2016 IEEE 84th
region via field experiments. Remote Sens. 9(5), 481 (2017) Vehicular Technology Conference. VTC-Fall, pp. 1–5 (2016)
26. Yeong, D.J., Velasco-Hernandez, G., Barry, J., Walsh, J.: Sensor 45. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-
and sensor fusion technology in autonomous vehicles: a review. Y., Berg, A.C.: Ssd: single shot multibox detector. In: European
Sensors 21(6), 2140 (2021) Conference on Computer Vision, pp. 21–37. Springer, Berlin
27. Li, Y., Jha, D.K., Ray, A., Wettergren, T.A.: Feature level sen- 46. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Pro-
sor fusion for target detection in dynamic environments. In: ceedings of the IEEE Conference on Computer Vision and Pattern
2015 American Control Conference (ACC), pp. 2433–2438. IEEE Recognition, pp. 7263–7271
(2015) 47. Terven, J., Cordova-Esparza, D.: A comprehensive review of
28. Kandylakis, Z., Vasili, K., Karantzalos, K.: Fusing multimodal yolo: From yolov1 to yolov8 and beyond. arXiv preprint
video data for detecting moving objects/targets in challenging arXiv:2304.00501 (2023)
indoor and outdoor scenes. Remote Sens. 11(4), 446 (2019) 48. Jocher, G., Chaurasia, J.Q.A.: Yolo by Ultralytics (2023)
29. Yang, Y., Lee, W., Osteen, P., Geneva, P., Zuo, X., Huang, G.: icalib: 49. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: trainable
inertial aided multi-sensor calibration. In: VINS Worshop (2021) bag-of-freebies sets new state-of-the-art for real-time object detec-
30. Mirzaei, F.M.: Extrinsic and Intrinsic Sensor Calibration. PhD the- tors. arXiv (2022). https://doi.org/10.48550/ARXIV.2207.02696
sis, University of Minnesota (2013) 50. Kaiming, H., Xiangyu, Z., Shaoqing, R., Jian, S.: Deep residual
31. Ackermann, J.: Robustness against sensor failures. Auto- learning for image recognition. arXiv:1512.03385 (2015)
matica 20(2), 211–215 (1984). https://doi.org/10.1016/0005- 51. Alex, K.: One weird trick for parallelizing convolutional neural
1098(84)90027-X networks. arXiv:1404.5997 (2014)
32. Azarang, A., Manoochehri, H.E., Kehtarnavaz, N.: Convolu- 52. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely
tional autoencoder-based multispectral image fusion. IEEE Access connected convolutional networks. In: Proceedings of the IEEE
7, 35673–35683 (2019). https://doi.org/10.1109/ACCESS.2019. Conference on Computer Vision and Pattern Recognition, pp.
2905511 4700–4708 (2017)
33. Guan, Q., Ren, S., Chen, L., Feng, B., Yao, Y.: A spatial- 53. Koonce, B., Koonce, B.: Efficientnet. In: Convolutional Neu-
compositional feature fusion convolutional autoencoder for mul- ral Networks with Swift for Tensorflow: Image Recognition and
tivariate geochemical anomaly recognition. Comput. Geosci. 156, Dataset Categorization, pp. 109–123 (2021)
104890 (2021). https://doi.org/10.1016/j.cageo.2021.104890 54. Xia, X., Xu, C., Nan, B.: Inception-v3 for flower classification. In:
34. Kwarteng, P., Chavez, A.: Extracting spectral contrast in Landsat 2017 2nd International Conference on Image, Vision and Comput-
thematic mapper image data using selective principal component ing (ICIVC), pp. 783–787. IEEE (2017)
analysis. Photogramm. Eng. Remote. Sens. 55(1), 339–348 (1989) 55. Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard,
35. Carper, W., Lillesand, T., Kiefer, R.: The use of intensity-hue- A., Le, Q.V.: Mnasnet: platform-aware neural architecture search
saturation transformations for merging spot panchromatic and for mobile. In: Proceedings of the IEEE/CVF Conference on Com-
multispectral image data. Photogramm. Eng. Remote. Sens. 56(4), puter Vision and Pattern Recognition, pp. 2820–2828 (2019)
459–467 (1990) 56. Karen, S., Andrew, Z.: Very deep convolutional networks for large-
36. Laben, C.A., Brower, B.V.: Process for enhancing the spatial res- scale image recognition. arXiv:1409.1556 (2014)
olution of multispectral imagery using pan-sharpening. Google 57. Soekhoe, D., Van Der Putten, P., Plaat, A.: On the impact of data set
Patents. US Patent 6,011,875 (2000) size in transfer learning using deep neural networks. In: Advances
37. Aiazzi, B., Baronti, S., Selva, M.: Improving component substi- in Intelligent Data Analysis XV: 15th International Symposium,
tution pansharpening through multivariate regression of ms + pan IDA 2016, Stockholm, Sweden, October 13–15, 2016, Proceedings
data. IEEE Trans. Geosci. Remote Sens. 45(10), 3230–3239 (2007) 15, pp. 50–60. Springer, Berlin (2016)
38. Ren, L., Pan, Z., Cao, J., Liao, J.: Infrared and visible image fusion 58. Blything, R., Biscione, V., Vankov, I.I., Ludwig, C.J., Bowers,
based on variational auto-encoder and infrared feature compensa- J.S.: The human visual system and cnns can both support robust
tion. Infrared Phys. Technol. 117, 103839 (2021). https://doi.org/ online translation tolerance following extreme displacements. J.
10.1016/j.infrared.2021.103839 Vis. 21(2), 9–9 (2021)
39. Li, H., Wu, X.-J.: Densefuse: a fusion approach to infrared and vis- 59. Valentine, K., Temple, W.G., Zhang, K.M.: Intelligent electric vehi-
ible images. IEEE Trans. Image Process. 28(5), 2614–2623 (2018) cle charging: rethinking the valley-fill. J. Power Sources 196(24),
40. Lee, J., Shiotsuka, D., Nishimori, T., Nakao, K., Kamijo, S.: Gan- 10717–10726 (2011)
based lidar translation between sunny and adverse weather for 60. Ma, Z.: Decentralized valley-fill charging control of large-
autonomous driving and driving simulation. Sensors 22(14), 5287 population plug-in electric vehicles. In: 2012 24th Chinese Control
(2022) and Decision Conference (CCDC), pp. 821–826. IEEE (2012)
123
61. Kaempchen, N., Dietmayer, K.: Data synchronization strategies Sousso Kelouwani received the
for multi-sensor fusion. In: Proceedings of the IEEE Conference Ph.D. degree in robotics systems
on Intelligent Transportation Systems, vol. 85, pp. 1–9 from the Ecole Polytechnique de
62. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, Montreal, in 2011. He completed
D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in his Postdoctoral Internship on fuel
context. In: European Conference on Computer Vision, pp. 740– cell hybrid electric vehicles with
755. Springer (2014) the University ofQuebec at Trois-
63. LLC, T.F.: Flir adas dataset. Online (2019) Rivières (UQTR), in 2012. He
64. Shivakumar, S.S., Rodrigues, N., Zhou, A., Miller, I.D., Kumar, developed expertise in the opti-
V., Taylor, C.J.: Pst900: Rgb-thermal calibration, dataset and seg- mization and intelligent control
mentation network. In: 2020 IEEE International Conference on of vehicular applications. He has
Robotics and Automation (ICRA), pp. 9441–9447. IEEE (2020) been a Full Professor of Mecha-
tronics with the Department of
Mechanical Engineering, since
Publisher’s Note Springer Nature remains neutral with regard to juris- 2017, and a member of the Hydro-
dictional claims in published maps and institutional affiliations. gen Research Institute. He holds four patents in U.S. and Canada. He
has published more than 100 scientific articles. His research interests
Springer Nature or its licensor (e.g. a society or other partner) holds include optimizing energy systems for vehicle applications, advanced
exclusive rights to this article under a publishing agreement with the driver assistance techniques, and intelligent vehicle navigation tak-
author(s) or other rightsholder(s); author self-archiving of the accepted ing into account Canadian climatic conditions. Holder of the Canada
manuscript version of this article is solely governed by the terms of such Research Chair in Energy Optimization of Intelligent Transport Sys-
publishing agreement and applicable law. tems and holder of the Noovelia Research Chair in Intelligent Nav-
igation of Autonomous Industrial Vehicles. Prof. Kelouwani was co-
president and president of the technical committee of the IEEE Inter-
national Conferences on Vehicular Power and Propulsion in Chicago
Jonathan Boisclair received a (USA, 2018) and in Hanoi (Vietnam, 2019). He is the winner of the
B.Sc.A. in computer science in Canada General Governor Gold Medal, in 2003, and a member of the
April of 2017. He completed the Order of Engineers of Quebec. In 2019, his team received the First
M.Sc. degree in 2019 from the Innovation Prize in partnership with DIVEL, awarded by the Asso-
Université du Québec à Trois- ciation des Manufacturiers de la Mauricie et Center-du-Québec for
Rivières (UQTR), Trois-Rivières, the development of an autonomous and natural navigation system. In
QC, Canada. In September of 2017, he received the Environment Prize from the Gala des Grands
2019, he began a Ph.D. in mechan- Prix d’excellence en transport, the Association québécoise du Trans-
ical engineering at Université du port (AQTr), for the development of hydrogen range extenders for
Québec à Trois-Rivières to improve electric vehicles.
his knowledge of applied artifi-
cial intelligence. His main research M. Zeshan Alam received his B.S.
interests are artificial intelligence, degree in Computer Engineering
autonomous driving, advanced driv- from COMSATS University, Pak-
ing techniques, and real computer istan, M.S. degree in Electrical
intelligence. and Electronics Engineering from
the University of Bradford, UK,
Ali Amamou received the B.S. and Ph.D. In Electrical Engineer-
degree in Industrial Computing ing and Cyber-Systems from Istan-
and automatic science from bul Medipol University, Turkey.
National Institute of Applied Sci- He worked at the University of
ences and Technology, Tunis, Cambridge as a post-doctoral fel-
Tunisia, in 2013 and the M.S. low where his work focused on
degree in Embedded Systems Sci- computer vision and machine learn-
ence from Arts et Métiers Paris- ing models. He currently joined
Tech University, Aix en Provence, Brandon University, Canada, as
France, in 2014. Between 2015 an assistant professor while also working as a Computer Vision Con-
and 2018, he completed his Ph.D. sultant at Vimmerse INC. His research interests include immersive
degree in electrical engineering videos, computational imaging, computer vision and machine learning
at Université du Québec à Trois- modeling.
Rivières (UQTR), Canada. In May
2018, he started a postdoctoral
fellow at the Hydrogen Research Institute. His main research interests
are the optimization of energy systems for stationary and mobile appli-
cations, hybridization of energy sources for vehicular application, and
eco-energy navigation of low-speed autonomous electric vehicles.
123
123
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at