0% found this document useful (0 votes)
18 views32 pages

Latest Advancements in Perception Algorithms For ADAS and AV Systems Using Infrared Images and Deep Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views32 pages

Latest Advancements in Perception Algorithms For ADAS and AV Systems Using Infrared Images and Deep Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 7

Latest Advancements in Perception


Algorithms for ADAS and AV
Systems Using Infrared Images
and Deep Learning
Suganthi Srinivasan, Rakesh Rajegowda
and Eshwar Udhayakumar

Abstract

Perception system plays an important role in Advanced driver assistance systems


(ADAS) & Autonomous vehicles (AV) to understand the surrounding environment
and further navigation. It is highly challenging to achieve the accurate perception of
ego vehicle mimicking human vision. The available ADAS and AV solutions could able
to perceive the environment to some extent using multiple sensors like Lidars, Radars
and Cameras. National Highway Traffic Safety Administration Crash reports of ADAS
and AV systems shows that the complete autonomy is challenging to achieve using the
existing sensor suite. Particularly, in extreme weather, low light and night scenarios,
there is a need for additional perception sensors. Infrared camera seems to be one of
the potential sensors to address such extreme and corner cases. This chapter aimed to
discuss the advantage of adding infrared sensors to perceive the environment accu-
rately. The advancements in deep learning approaches further leverages to enhance
ADAS features. Also, the limitations of current sensors, the need for infrared sensors
and technology, artificial intelligence and current research focus using IR images are
discussed in detail. Literature shows that by adding IR sensor to existing sensor suite
may lead a way to achieve level 3 and above autonomous driving precisely.

Keywords: perception, ADAS, autonomous vehicle, visible image, infrared image,


automotive sensors, deep learning, object detection

1. Introduction

Recent development in sensor technology, processors and computer vision


algorithms used for processing the captured data lead to rapid development of
perception solutions for Advanced driver assistance systems (ADAS) and autono-
mous vehicles (AV). The drive for autonomous vehicles was triggered due to the
DARPA grand challenge. The US Armed Forces and Defense Advanced Research
Projects Agency (DARPA) conducted a Robotic challenge towards the development
of unmanned autonomous systems which could eventually replace human drivers in
1
Digital Image Processing – Latest Advances and Applications

combat zones and hazardous areas without remote operators. Many DARPA Grand
Challenges were organized in order to come up with the technology for fully autono-
mous ground vehicles with collaboration across diverse fields. The first challenge
took place in 2004 when 15 self-driving vehicles were competing to navigate around
228 km across the desert in Primm, Nevada. None of the team was able to succeed
due to the technological hurdles involved. The second event was held in 2005 in
southern Nevada were 5 teams competed to navigate 212 km. With limited and better
technology, this time, Stanford University’s Stanley managed to complete the distance
and won the prize money. In 2007, third event took place in an urban environment
commonly known as DARPA Urban Challenge. Here, the team need to showcase the
autonomous driving capability in the driving traffic scenarios and needs to perform
complete maneuvering that includes braking and parking events. The boss vehicle
from Carnegie Mellon University won the first prize and the Junior vehicle from
Standford University claimed the second prize [1]. Since after DARPA, the grand
challenges for autonomous driving solutions, the accurate perception and navigation
of the vehicle more autonomously became one of the hottest fields for research and
industry. There are 6 levels of Society of Automotive Engineers (SAE) international
standards to achieve fully autonomous driving from ADAS as shown in Figure 1. Upto
level 3, the presence of driver is mandatory to take control of the vehicle whenever
needed. Levels 4 and 5 allow for fully autonomous driving with and without driver,
respectively [3].
Worldwide, major accidents were reported due to human error which might
have resulted in fatal incidents. Considering the safety of the drivers and passen-
gers, ADAS systems have indeed been targeted by top manufactures to support the
drivers during unpredicted circumstances. The most common ADAS systems that
are available include lane departure warning, forward collision warning, high beam
safety system, traffic signals recognition, adaptive cruise control and so on. ADAS
systems are semi-autonomous driving concepts that assist drivers during driving.
The objective is to automate, adapt, and enhance safety by reducing human errors.
The fully autonomous vehicle is capable of sensing the environment and navigating
without human intervention under all environment circumstances. Here, the vehicle
is capable of perceiving the environment, thinking and reasoning to take a deci-
sion and control the vehicle autonomously similar to a human driver [4]. The major

Figure 1.
Levels of automation according to SAE standards [2].

2
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

components of autonomous vehicles include sensing, perception, decision making


such as path and motion planning and actuating by generating controls for steering
and braking.
Like human driver, ADAS and AV systems aim to perceive the environment using
various sensors which help them to navigate autonomously. Passive sensors such as
visible cameras and active sensors such as Lidars, Radars and ultrasonic are the most
commonly preferred sensors to perceive the surrounding environment. Multiple sen-
sors of different types are configured in such a way that the complete 360° surround-
ing environment can be perceived.
Figure 2 shows the state-of-the-art ADAS solution and sensors used to perceive
and derive the information from environment. This article focuses mainly on the
accurate perception of environment using various sensors. It includes vision, lidar,
radar and ultrasonic sensors. Vision sensors are used either as monocular and/or
stereo cameras. They are mostly used for 2D/3D object & pedestrian detection, lane,
parking slots, traffic signs and signal detection and recognition. Both 2D and 3D
Laser scanners (Lidar) are precise in determining the object’s position, orientation
and object dimension. All moving and static objects that include buildings and sur-
roundings along with road markings and kerbs can be detected using laser scanners.
Automotive lidars are capable of scanning the environment both vertically and hori-
zontally covering from certain field of view to 360° view. On the other hand, Radars
generate the object position and relative velocity information more accurately for the
detected objects. Each sensor exhibits its own advantages and disadvantages [3, 4].
For example, in adverse weather conditions, the performance of camera sensor may
not be accurately compared to radar sensor to some extent. Similarly, camera sensor
lacks in-depth estimation whereas, radars and lidars estimate depth more accurately.
Individual sensors may not help to achieve level 2 and above ADAS solutions. Hence,
to achieve the goal of level 3 and above autonomy, multiple sensor modalities in
different configurations shall be beneficial [3–5]. The fusion of complementary and
redundant information from various sensors helps to generate the complete percep-
tion of the ADAS and AV systems [4, 6]. In the automotive industry, many original
equipment manufacturers (OEM)s and Tier-1 s are extensively concentrating on the
research and development of successful autonomous driving concepts and Level 3 &
above ADAS. Waymo, Uber, Tesla, Zoox etc., are a few OEMs into this autonomous
vehicle research and development. Every autonomous vehicle is equipped with one
or more automotive sensors that include cameras, lidars and radars in order to sense
the environment as a perception system, supported by guidance and navigation
systems. Manufacturers and operates of ADAS (Level 2 and above) and AV systems

Figure 2.
State-of-the art-ADAS sensors.

3
Digital Image Processing – Latest Advances and Applications

need to report crashes to the US agency as per a general standing order issued by
NAtional Highway Traffic Safety Administration (NHTSA) in 2021 [2]. An average
of 14 accidents were reported since July 2021 with the maximum of 22 and minimum
of 8 crashes reported in a month by AV (level 3 and above) equipped vehicles still in
development. Similarly, an average of 44 crashes were reported since July 2021 with
maximum of 62 and minimum of 26 crashes reported in a month by level 2 ADAS
equipped vehicles. Autonomous vehicle collision reports shows that the on road
experience of such to be claimed as matured ADAS & autonomous driving technol-
ogy still needs improvement to match with human driver perception. Numerous
incidents as major and minor crashes are reported by self-driving cars on the road
for test drive. This shows that the existing sensor suite that are claimed to achieve
level 4 autonomous driving are lacking with the performance especially during
extreme weather conditions, dark/night scenarios, glare and so on. This demands for
more robust sensors to achieve full autonomous driving irrespective of environment
conditions.
The recent advancements in artificial intelligence (AI) have a significant impact on
the fast development and deployment of level 3 and above ADAS and AV solutions.
Especially, in order to generate precise information from surrounding environment,
high data from different sensing modalities and advanced computing resources play
a key role in enabling AI as an essential component for ADAS and AV perception
system [7]. Extensive research and development activities are currently invested to
analyze the effective use of AIs in various functionalities of AVs such as perception,
planning and control, localization and mapping and decision making.
In this article, the limitations and challenges of the existing sensor suite, need
for infrared camera, infrared technology, applications of AI in the development of
perception systems and the need for multiple sensor fusion strategy are presented in
detail. Also, some latest research in the field of infrared sensors and deep learning
approaches for ADAS and AV systems are discussed in detail.

2. Perception sensors for ADAS and AV systems

In ADAS and AV systems, sensors are considered equivalent to eyes and ears of the
human driver to sense and perceive the environment. Different aspects of the envi-
ronment are sensed and monitored using various types of sensors and the informa-
tion will be shared with the driver or Electronic control unit. This section introduces
commonly used automotive sensors and their functionality in achieving level 2 and
above ADAS and AV solutions. Figure 3 shows the representative image of a vehicle
equipped with perception sensors for one or more ADAS and AV solutions [9].

2.1 Vision sensors

RGB or visible cameras are most commonly used in any ADAS and AV systems
due to their low cost and easy installation. Normally, more than one camera is used
to capture/sense the complete environment. Images captured by vision sensors are
processed by an embedded system to detect, analyze, understand and track various
objects in the environment. Captured images are rich in information such as color,
contrast, texture and details which are more unique features over other sensors.
Visible cameras are used as a single lens camera called as monocular camera and two
lens camera called as stereo setup. Monocular cameras are low-cost and require less
4
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Figure 3.
Representative figure of AV/ADAS vehicle with various perception sensors. Figure from [8].

processing power. They are commonly used for object detection and classification,
lanes and parking lines detection, traffic sign recognition etc., Monocular cameras
lack distance/depth information compared to other active sensors. There are a few
techniques used to estimate the distance information but not as accurate as expected
by ADAS and AV systems for maneuvering autonomously. On the other hand, stereo
cameras are useful in extracting depth or distance information as the system con-
sists of two lenses separated by a distance resembling two eyes of the human. Such
systems are highly beneficial to detect and classify the objects in the environment
along with depth/distance information with better accuracy compared to monocular
cameras. When compared to the other automotive sensors, the depth estimation
using stereo cameras is reliable up to 30 m over a short distance [10]. Autonomous
vehicles demand accurate distance estimation at far distances, especially in high ways
(Figure 4) [3, 10].

Figure 4.
Common adverse scenarios where current ADAS/AV sensor suite struggles to perform.

5
Digital Image Processing – Latest Advances and Applications

2.1.1 Lidar

LiDAR stands for Light Detection and Ranging. This is an active sensor that works
by emitting a laser beam which gets reflected by any object. The time taken between the
emitted laser beam and its reflection measures the distance of the object. These sensors
are capable of generating high-resolution 3D point clouds and operate at longer ranges
when compared to vision sensors. Also, it can generate 360° 3-D images surrounding
the ego vehicle with accurate depth information. In recent autonomous vehicles, LiDAR
sensor plays a major role in driving the vehicle autonomously by generating accurate
and precise environment perception. ADAS systems such as autonomous braking,
parking solutions, collision avoidance, object detection etc., can be achieved with more
accuracy using Lidar sensors. A major setback of Lidars is the bulky size and expensive.
Also, extreme weather conditions such as rain and fog can impact the performance of
Lidar sensors. Due to the latest advancements in semiconductor technology, signifi-
cantly smaller and inexpensive Lidars may be possible in future [3, 10].

2.1.2 Radar

Radar stands for radio detection and ranging. This is an active sensor that works
on the principle of the Doppler effect. Radars emit microwave energy and measure
the frequency difference between emitted and reflected beams in order to estimate
the speed and distance of the object from which the energy gets reflected. It is capable
of detecting objects at a longer distance compared to lidar and vision sensors. Radar
performs equally in all weather conditions, including extreme conditions such as rain
and fog. Radars are classified as short, medium and long-range sensors. Short and
medium-range sensors are mostly used in blind spot detection, cross traffic alert and
they will mounted in the corners of the vehicle. Long-range radars are mostly used for
adaptive cruise control and are mounted near front/rear bumpers [3, 10].

2.1.3 Ultrasonic sensors

Ultrasonic sensors are active sensors which use sound waves in order to measure
the distance between ego vehicle and objects. Such sensors are most commonly used to
detect nearby objects to the vehicle such as kerbs, especially in parking spaces [3, 10].
Other sensors such as GPSs and IMUs are also used in most of the ADAS and AV
use cases. These sensors are used to measure the position and for localization of the
ego vehicle. Table 1 shows the consolidated summary of the advantages and disad-
vantages of various sensors for ADAS and AV systems.

2.1.4 Challenges with existing sensor suite

Human-perceivable sunshine is a small part of the spectrum of solar irradiance,


which approximately contains 5% ultraviolet, 43% visible, and 52% infrared wave-
lengths [12]. Whereas at night, streetlamps and headlamps are the primary sources of
light. However, the lighting pattern of the vehicle headlamps is strictly regulated for
safety reasons: the range of low and high beams can only vary from 60 m to 150 m. The
visibility of the targets of interest may vary based on the light reflections from their
surfaces. Diffuse reflection is observed when the surface is rough like asphalt, cloth-
ing, wood etc. and specular reflection is observed when the surface is mirror-like, wet,
epoxy, metallic etc. However, as shown in Figure 5, adverse conditions, such as direct
6
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Sensor Advantages Disadvantages

Radar Long range Low accuracy and resolution

Works in poor visibility Mutual interference of Radar

Consumes less power Poor azimuthal and elevation resolution

Robust to Failure Error-prone object classification

Small, lightweight and affordable No surround view perception

Lidar Long range Expensive and high-maintenance

Surround-view perception Transmission is sparse

Good accuracy and resolution Affected by varying climatic conditions

No significant interferences Poor small object detection

Ultrasonic Low in cost Only short-range distances

Low in cost and small in dimensions Sensitive to temperatures

At short ranges, higher resolution Prone to interference and reverberation

Overcome pedestrian occlusion problems Pick noise from environments

Camera High-resolution and color scales Requires a powerful computation

Surround-view and 3D information Poor distance estimation

Low cost and maintenance Sensitive to adverse conditions

Small size and Easy to deploy Inaccurate during low light

Infrared Works at all light conditions Demanding computation sources

The sensing range can cover up to 200 m Classification issues in cold conditions

Better vision through dust, fog, and snow Challenging to detect and classify

Table 1.
List of advantages and disadvantages of various sensors for ADAS/AV perception systems [11].

Figure 5.
Electromagnetic Spectrum-representation of IR range

sun glare, dense fog, heavy rain, high beam glare, surface reflections and low light,
would cause light reflections that result in the reduction of visibility for RGB Cameras.
Existing ADAS solution in the market uses vision & ultrasonic sensors predomi-
nantly to achieve level 1 features such as warning and alert signals for traffic lights,
and detecting obstacles while reversing the vehicle etc., To achieve level 2 and above
ADAS features, vision, radar and ultrasonic sensors are used either as individual sen-
sor modality or combination of these sensors. For example, a vision sensor is capable
of generating accurate detection and classification of on-road objects, static objects
7
Digital Image Processing – Latest Advances and Applications

etc., whereas radar sensor is capable of generating accurate position and velocity,
distance of the objects from a vehicle. Hence, these two pieces of information will
be fused to generate the combined representation of the detected objects with their
position, velocity and class. Here, each object can be represented with more rich
information so that the guidance and navigation module can make proper decisions
and planning for the vehicle to move autonomously. Fusion of the information from
multiple sensors modality implies the combined representation of complementary
and redundant information in order to represent the environment more precisely.
Also, this enables theextension the feasibility of ADAS and AV systems to function in
all weather/lighting conditions [6]. However, above said sensor fusion is best suited
to address the challenges during daylight and not fully compliance during night time.
The sensors combination in the latest ADAS and AV solutions are still not able to
precisely generate the environment perception as human driver perceives. The latest
standing general order crash report [2] available on the National Highway Traffic
Safety Administration web page, clearly shows that the existing sensor suite is not
sufficient to achieve level 4 & above autonomous driving. The system fails especially
during adverse weather conditions, low light and dark scenarios, extreme sun glare
and so on. After the accident of the autonomous Uber car in 2018 [13], the research
community started considering including the infrared sensor in the ADAS sensor
fusion suite. This shows that there is a need for another sensor which can comple-
ment the information, especially during extreme weather and lighting conditions. To
enable level 4 or 5 ADAS functionalities with zero human intervention, it is necessary
to make the system more robust to various weather and lighting conditions [11, 14].

2.1.5 Need for infrared sensors for ADAS and AV

The most state-of-the-art approaches constitute camera and lidar sensors as the
major sensory modality for detecting the object, mostly due to the fact that these
sensors will have dense image pixel information/high-density point cloud data.
Whereas, these lidar and radar sensors are costly and computationally expensive.
Vision sensor-based perception algorithms depend on the brightness and contrast of
the images captured. They are cost-effective sensors and use either image processing
or deep learning-based techniques for object detection. Due to limited image features,
CNN-based algorithms are able to detect the objects only with good lighting condi-
tions or with minimum lux level. Moreover, vision sensor-based approaches often fail
at daylight glare, night-time glare, fog, rain, and strong light/direct sun situations.
The performance may be degraded in poor lighting conditions such as dark scenarios,
dams, tunnels, parking garages, etc.,. Similarly, when there is contamination in the
camera lens and increased complexity of the scenes such as the detection of pedestri-
ans in crowded environment are always challenging for any vision-based perception
algorithms [15].
Apart from the mentioned challenges, the vision sensors are also limited to the
distance of the objects. Reasonable performance can be expected up to 20 meters
using visible images and hence, recognition of long-distance objects is limited. It
has limitations in low light conditions, fog and rain weather conditions to sense the
environment precisely. The limitations and challenges can be overcome by adding
additional sensors such as lidars and radars that can detect objects at a long distance
even in fog and rain.. Recent accidents by Uber and Tesla autonomous vehicles indi-
cate that the sensors suite comprising vision sensors, lidars and radars are not suffi-
cient, especially in the detection of cars and pedestrians during extreme weather and
8
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

lighting conditions The robust perception algorithm is expected to work in all lighting
and adverse weather conditions as shown in Figure 5. Hence, the ADAS/autonomous
driving sensor suite should require a night vision-compliant data-capturing sensor,
such as infrared or thermal imagery. Infrared (IR) sensors look promising, especially
in extreme conditions such as poor lighting, night, bright sun glare and inclement
weather. IR sensors are capable of classifying vehicles, pedestrians, animals and all
objects in common driving conditions. They are capable of performing equally in
daylight and dark scenarios. It outperforms in low light and dark lighting conditions
compared to other sensors used for ADAS and AV applications [15].

3. Infrared sensor technology

Infrared falls within electromagnetic radiation with wavelengths longer than the
visible spectrum and shorter than radio waves. IR ranges from 0.75 μm to 1000 μm.
Any object with an absolute temperature over 0 K is capable of radiating infrared
energy. Generally, it is a measure of internal energy due to the acceleration of electri-
cally charged particles. Hotter objects radiate more energy. It is invisible to human
eyes but sensed as a warmth on the skin. The electromagnetic spectrum and IR range
are shown in Figure 5. Infrared sensor is an electronic device which is capable of
emitting and detecting Infrared radiations within the IR range. It works on three basic
laws of physics [16–18]:

1. Planck’s law of radiation—It states that any object whose temperature is not equal
to absolute zero Kelvin (O K) emits radiation

2. Stephan Boltzmann’s law—It states that the total energy emitted by a black body
at all wavelengths is related to the absolute temperature

3. Wein’s law of displacement—It states that objects at different temperatures emit


spectra whose peak is at different wavelengths that are inversely proportional to
the temperature.

Figure 6 shows the radiant existence of a perfect black body according to Planck
law and the maximum peak is inversely proportional to the temperature as per Wein’s
displacement law.
Based on the operating principles, IR sensors are broadly classified into thermal
and photonic detectors. Infrared rays emitted by any objects are captured by thermal
sensors and then converted into heat which is then transformed into a change in
resistance. Thermo electromotive force extracts the output. Quantum sensors use the
photo conductive effect and photovoltaic effect in semiconductors and PN junctions.
For ADAS and AV applications, thermal sensors are most widely accepted. Another
classification of IR detectors into cooled and non-cooled detectors based on operat-
ing temperature is often used at initial stages. Based on the detector’s construction,
it can be further classified into single and linear or array detectors. The commonly
used detector arrangement is the FPA, focal plane array sensor which consists of
multiple single detectors placed as array of matrices [16–18]. Further classification of
IR sensor type is based on the operating frequency band as IR detectors operate in the
bank where maximum transmission with minimal absorption is possible [16–18]. It is
generally classified as:
9
Digital Image Processing – Latest Advances and Applications

Figure 6.
Radiant existence of a black body according to Planck law [16].

1. NIR—Near infrared radiation falls in the range from 0.75 μm to 1 μm

2. SWIR—Short wave Infrared Radiation falls in the range between 1 μm and


2.5 μm

3. MWIR—Mid-wave infared radiation falls in the range between 3 μm and 5 μm

4. LWIR—Long Wave Infrared Radiation falls in the range between 8 μm and 12 μm

A pictorial representation of the IR sensor types based on the operational fre-


quency band is shown in Figure 7 and Table 2 summarizes the various specifications
of IR sensor types. NIR and SWIR cameras are known as ‘reflective infrared’ like RGB
cameras, which require an external light source for illumination. NIR is mainly used
in the in-cabin applications for Driver monitoring systems. Whereas, SWIR provide

Figure 7.
Infrared sensor types based on operational frequency bands.

10
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

NIR SWIR LWIR

Known as Reflected IR Reflected IR Thermal IR

Wavelength, μ m 0.7–1.4 1.4–3 8–14

Imager Silicon-based InGaAs detectors Photon detectors

Detection range Short Short Long (165 m)

Usage Night vision Feature extraction Heat detection

Applications Driver monitoring Road marking detection Pedestrian and object detection

Table 2.
Details of various types of infrared images used in automotive industry.

more context information, like lane markings, traffic signs, etc. Unfortunately, SWIR
camera applications are uncommon due to the high cost of indium gallium arsenide
(InGaAs) detectors [19]. LWIRs are commonly referred to as ‘thermal infrared’ as
they operate solely through thermal emissions and do not require any external sources
for illumination [20].
Representative images for visible camera (RGB), near infrared (NIR), short-
wavelength infrared (SWIR), mid-wavelength infrared and long-wavelength infrared
(LWIR) cameras are shown in Figure 8 [16–18].
The basic components of the Infrared imaging system are shown in Figure 9. It
measures the IR radiations emitted from the objects and then converts them into
electrical impulses using an IR detector. The converted signal is transformed into tem-
perature map considering the ambience and atmospheric effects. A temperature map
is displayed as an image which can be color-coded to represent thermograms thermal
images or IR images using an imaging algorithm. IR detector acts as a transducer,
which converts radiation into electrical signals. Microbolometric detectors are the
most widely used IR detectors as they can operate at room temperature. It is basically
a resistor with a very small heat capacity and a high negative temperature coefficient
of resistivity. IR radiations received by detectors change the resistance of the microbo-
lometers and produce the corresponding electrical outputs. Considering the ambient
and atmospheric effects between the object and IR detector, an infrared measurement
model is used to represent the detected heat map to temperature map. The measure-
ment model depends on the emissivity of the object, atmospheric and ambient
temperature, relative humidity and distance between the object and detector as per

Figure 8.
Representative images for visible camera (RGB), near-infrared (NIR), short-wavelength infrared (SWIR), mid-
wavelength infrared and long-wavelength infrared (LWIR) cameras [20].

11
Digital Image Processing – Latest Advances and Applications

Figure 9.
Basic components of IR imaging system.

Figure 10.
Representative RGB and IR images used in ADAS and AV applications from Kaist multispectral pedestrian
detection dataset [21].

the FLIR thermal camera. It varies between different manufacturers and IR detector
characteristics. The measured temperature map is visually represented as grayscale
images. For industrial applications, pseudo color-coded images named as thermo-
grams can also be generated by the thermal imaging system for easy representation of
the difference in temperature distribution. Figure 10 shows a representative RGB and
the corresponding IR thermal images used in ADAS and AV applications [17].

4. Role of infrared sensors in ADAS and AV applications

In automotive domain, it is a highly challenging requirement to achieve level 3 and


above autonomous driving under all weather and lighting conditions and deliver an
environment perception. ADAS and AV systems are expected to support all driving
conditions, in high complex roads, totally unpredictable situations and vehicles must
be mounted with cost-effective sensor suites that are capable of extracting maximum
information possible to make an accurate decision. It is also expected that the per-
ceived environment to represent the scene information adequately such that computer
vision algorithms can detect and classify the objects. This will ensure precise autono-
mous navigation and control and provide safe advanced ADAS and AV systems.
SAE automation level 2 systems in commercial market already have vision sensors,
ultrasonics and radar sensors. The next level of SAE automation can be achieved by
adding multiple sensors and lidar sensors to the existing sensor suite. Each sensor
has their own limitations and advantages [16–18]. Thus, sensor fusion will come into
play where the advantages of different sensor modalities can be utilized to address
the various limitations. Even though, NHTSA shows that the existing ADAS and AV
solutions are lacking in achieving the expected performance as there were crashes
reported by every vehicle supported with ADAS and AV features [2]. Also, the Uber
12
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

and Tesla accidents clearly show that the current SAE automation level 2 and 3 sensors
suite do not provide accurate detection of cars and pedestrians. In particular, vulner-
able road users (VRUs) such as pedestrians, animals and bicyclists are challenging to
detect and classify accurately. Classification of these objects are challenging in poor
lighting condition, low light and dark scenarios, direct sunlight in driving direction
and extreme weather conditions such as fog, rain and snow. The performance of vision
sensors does not meet the requirement of autonomous driving in such conditions.
Other sensors performance was also more or less affected and failed to provide the
complete environment perception for autonomous navigation. A combination of low
light vision sensors, lidar and radar can help to some extent to perform night scenarios
up to 50 meters. Beyond that, it will be challenging to drive the vehicle autonomously
[15]. Infrared sensor technology overcomes the aforementioned challenges and
reliably detects, classifies cars, vehicles, pedestrians, animals and other objects that
are common in driving scenarios. Also, IR sensors are capable of performing equally
in daylight conditions thereby they can provide redundant information for the
existing sensor suite thereby increasing confidence in detection and classification
algorithms. IR sensors can be used effectively to address the limitations of vision and
other sensors. The real-time performance of IR camera is not affected by low light
and dark scenarios, sun glare or vehicle headlight and brake light reflections. It can be
considered as a potential solution in extreme weather conditions such as snow, fog and
rain. Un-cooled thermal imaging systems that are available in the market are the most
affordable low-cost IR sensors due to advancements in microbolometer technology.
These sensors are capable of generating the temperature map as an image to visually
analyze and further process the environmental information. In existing automotive
sensor suites, these sensors can supplement or even replace existing technology due
to the advantages of sensing the infrared emission of the objects, and operates inde-
pendently of illumination conditions, thereby providing a promising and consistent
technology in order to achieve more precise environment perception systems [15].
As per the NTSB report, the fatal incident of a pedestrian by Uber vehicle which is
a level 3 autonomous car in Tempe, Arizona. This vehicle uses lidar, radar and vision
sensors. The report shows that the incident happened at night time when there were
only street lights lit. The system is first classified as an unknown object, after some
time as a car, then as a bicycle and finally as a person. This scenario was recreated
and tested by FLIR using a wide field of view thermal camera using a basic classifier.
The system was capable of detecting the person at a distance of approximately 85.4 m
which is twice the required stopping distance for a vehicle at 43 mph. When per-
formed using narrow FOV cameras, FLIR IR cameras have demonstrated the perfor-
mance of pedestrian detection even at a distance four times greater than the required
by decision algorithms in autonomous vehicles [15].
The vehicle currently on the road with ADAS systems that support SAE level 2
with partial automation and level 3 with conditional automation does not include
an IR sensor in the sensor suite. AWARE (All Weather All Roads Enhanced) vision
project was executed in 2016 to test the potential of sensors operating in four differ-
ent bands on the electromagnetic spectrum such as Visible RGB, near infared (NIR),
short wave Infrared (SWIR) and long wave infared (LWIR), especially in challenging
conditions such as fog, snow and rain. It was reported that LWIR camera performed
well in detecting pedestrians in extreme fog (visibility range = 15 ± 4 m) compared
to NIR and SWIR whereas the vision sensor reported with lowest detection perfor-
mance comparatively. Similarly, LWIR camera was capable of detecting pedestrians
in extreme dark scenarios and when reflections were present due to other vehicles
13
Digital Image Processing – Latest Advances and Applications

headlights in the fog conditions whereas the other sensors failed to detect pedestrians
as they were not seen due to headlight glare/reflections [15, 22]. Also, as per Wien’s
displacement law, the peak radiation for the human skin with emissivity 0.99 is 9.8 μm
at room temperature which falls in LWIR camera operating range. Therefore, being
completely passive sensors, LWIR cameras take advantage of sensing the emitted IR
radiations from objects irrespective of extreme weather conditions and illuminations
[16]. The distance at which the IR sensor can detect and classify an object depends on
the field of view (FOV) of the camera. Narrow FOV cameras are capable of detecting
objects at far distances whereas wide FOV cameras are capable of detecting objects in
the greater angle of view. Also, IR sensors require the target object of 20 x 8 pixels to
reliably detect and classify the object. IR sensor with a narrow FOV lens is capable of
detecting and classifying an object of 20 x 8 pixel size at a distance greater than 186
meters. Therefore, IR sensors with narrow FOV can be used on highways to detect far
objects whereas wide FOV cameras can be used in urban or city driving scenarios [15].
In the automotive domain, IR sensors can be effectively used in both in-cabin sensing
and driving applications. In cabin sensing applications, IR sensors will be mounted
inside the vehicle in order to understand the driver’s drowsiness and fatigue detection,
eye gaze localisation, face recognition, occupant gender classification. Facial expres-
sion, emotion detection etc. In ADAS and AV systems, IR sensors can be efficiently
used to generate precise and accurate perception of the surrounding environment by
providing one or more cameras mounted in the vehicle. It is commonly used to gener-
ate object detection, classification and semantic segmentation [23].

5. AI applications in the development of ADAS and AV systems

In the automotive domain, the challenges to achieve full level 5 autonomous


driving seem to be possible due to the advancements in sensing technologies and
artificial intelligence (AI). Advancements in AI advancements lead to achieving the
requirement of AV systems such as perceiving, thinking and reasoning. In addition
to that, advanced computing resources to process the huge amount of data sensed
through multiple sensors of different modalities also play an important role for the
research and development community to move towards autonomous vehicles. In
ADAS and AV systems, AI becomes the gold standard approach in order to perceive
the surrounding environment so that proper planning and vehicle motion can be
achieved. In Level 3 and above ADAS and AV systems, AI is used primarily in percep-
tion, localization and mapping and decision-making. There is a need to understand
the various aspects of AI applied in AV development and the current practices in
bringing the AI system for autonomous driving. In particular, deep learning (DL)
based algorithms are capable of handling various challenging issues such as accu-
rate and precise object detection and classification, appropriate control of steering
wheel, acceleration and deceleration etc., AI research field focuses on DL approaches
including convolutional neural networks, LSTM and deep belief network for vehicle
perception, motion planning, path planning and decision making [4, 7]. Generally,
DL-based approaches are used to recognize the objects on the road that includes a per-
ception module and localization and mapping module. On-road objects are detected,
classified, fused and tracked in the perception module. It receives input from multiple
sensors of different modalities such as lidars, radars, vision and IR cameras. In addi-
tion to on-road objects, vision sensor-based traffic sign detection, lane detection and
parking lines detection are also achieved using DL approaches. It can also be used to
14
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Figure 11.
Representative convolutional neural network and its components.

generate the object-level information such as object position, size, class, distance,
and orientation from ego vehicle and also as semantic information which generates
pixel-wise object class. Mostly commonly convolutional neural networks (CNN) are
used for object detection and classification tasks. A more generic representation of
convolutional neural network and its components are shown in Figure 11. Any deep
learning model has an input layer, few hidden layers and a final fully connected layer
called an output layer. Input layer takes an image as input and final output layer define
the detected objects along with their confidence score. A combination of the convolu-
tion layer, pooling and activation layer represents one hidden layer whereas in deep
networks, multiple layers of these feature extraction layers are added to extract the
coarse to detailed information. At last softmax function is used to classify more than
one object detected with the corresponding confidence score based on the feature
similarity. The object with the highest confidence score is recognized as the object
class with the detected bounding box information [24]. CNN, RCNN, fast RCNN,
faster R-CNN, SSD, Yolo, Yolov2, Yolov3 etc., are a few DL approaches that are most
commonly used for the recognition of road objects. In case level 3 and above ADAS
and AV systems, multi-task networks are predominantly used where a common
network architecture will be trained to perform multiple tasks. End-to-end AI system
can also be used to generate the complete perception of ADAS and AV systems that
include perception, localization and mapping algorithms [7].

6. Survey on deep learning approaches using infrared images for ADAS


and AV applications

Training, testing and validation are the most important steps to be considered
for any deep learning-based perception algorithms. After proper training of the
networks, it can deployed in real time to perceive the environment successfully.
The captured dataset and its size used for training the model play a critical role
while deploying the model in real time. Convolutional neural networks (CNN) have
significantly improved the performance of many ADAS and AV applications, whereas
it requires significant amounts of training data to obtain optimum performance and
reliable validation outcomes. Fortunately, there are multiple large-scale publicly avail-
able thermal datasets with annotations which can be used for training CNN. However,
15
Digital Image Processing – Latest Advances and Applications

when compared to visible imaging data, there are not many 2 dimensional thermal
datasets available for automotive applications on the open internet. In Table 3, a list
of available datasets captured by various types of thermal sensors in different envi-
ronmental conditions is provided. These datasets are widely used in building various
pre-trained CNN models for several applications like Pedestrian detection, Vehicle
detection and classification and small object detection in the thermal spectrum on
GPU/edge-GPU devices for the automotive sensor suite [31]. Whereas there are inher-
ent challenges also associated with the training and validation of the thermal data
for CNN models are as follows: A limited number of publicly available datasets, little
variability in the scenes such as weather, lighting and heat conditions, difficulties
while using RGB pre-trained CNN models for thermal data.

6.1 Object detection in the thermal spectrum

Object detection is an important part of autonomous driving, which is expected


to accurately and rapidly detect vulnerable road users (VRUs) like pedestrians and
cyclists, vehicles, traffic signals, sign boards, animals etc. during all lighting and
weather conditions. These detection results are used for motion tracking and pose
estimation of the objects and subsequently utilized to take appropriate actions while
ADAS technologies like Cruise control, Lane departure systems and Emergency
breaking are in action.

6.1.1 Classical detection techniques

The acquisition of frames, selection of region of Interest, feature extractions


and classification are the generic steps followed for VRUs detection. Identifying
the Region of Interest (ROI) [32] is the the first instance of detection of the desired
objects and then feature extraction takes place such as edges, shapes, curvature
etc. These extracted features are further used for object classification. Background
subtraction is the most commonly used technique for detecting the moving objects,
whereas there are more advanced techniques such as sliding window, objective-
ness and selective search to counter adverse conditions. The Histogram of Oriented
Gradients (HOG) features [33] and Local Binary Patterns (LBP) [34] are basic
hand-crafted image processing techniques to extract features and classify objects, but
they are limited when we have complex features are need to be extracted. Whereas
deep learning-based techniques allow the network to extract the features, which can
provide a higher level of information. Then support vector machine, a decision tree or

Dataset Size Resolution Thermal Images Content

C3I [25] 0.5 GB 640×480 39,000 Person, Vehicles, Bicycles, Bikes

LITIV [26] 0.7 GB 320×430 6000 Person

CVC [27] 5 GB 640×480 11,000 Person, Vehicles, Bicycles, Bikes, Poles

TIV [28] 5.4 GB 1024×1024 63,000 Person, Vehicles, Bicycles

FLIR [29] 16 GB 640×512 14,000 Person, Vehicles, Bicycles, Bikes, Poles, Dogs

KAIST [30] 37 GB 320×256 95,000 Person, Vehicles, Bicycles, Bikes, Poles

Table 3.
Details of various open access infrared datasets available related to ADAS applications.

16
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

a deep neural network are used to classify the object. DL-based techniques are found
to be the out performing the traditional methods [35].

6.1.2 Deep learning approach for object detection

The two commonly used deep learning approaches for VRUs detection are two-stage
detector (region proposal approach) and single-stage detector (non-region proposal
approach). Two-stage or regional proposed approach is the hand-crafted technique
that uses HOG, LBP or similar for feature extraction in the first stage and includes CNN
networks for classification in the second stage such as region-CNN (R-CNN) [36],
regional-fast convolution network (R-FCN) [37] and faster R-CNN [38]. Whereas sin-
gle-stage detector is able to perform region proposal, feature extraction as well as clas-
sification in a single step. Some of the non-region proposal-based approaches include
single shot detector (SSD) [39] and you only look once (YOLO) [40]. The advantages
and disadvantages of two and single-stage detectors are presented in Table 4.

6.1.3 Deep learning based sensor fusion techniques

Many studies are conducted around the most reliable approach to using both
color and thermal information from visible cameras and thermal cameras [41–43].
Commonly, these studies highlight the illumination dependency of visible cameras as
well as their limitations during adverse weather conditions and the benefits of including
the thermal data for better performance. The fusion of visible and thermal cameras aids
in reducing the inaccuracies of object detection mainly during nighttime. The fusion
of this sensor information can be possible at various levels like pixel level, feature level
or decision level. For pixel-level fusion, the thermal images that are intensity values,
which can be fused with the visible images in the intensity (I) component and fused
images can be reconstructed with the new I value. In general, the pixel-level fusion is
done through the following methods: Wavelet-based transform, curvelet transform and
laplacian pyramid transform fusion. Typically pixel level fusion is not used with deep
learning-based sensor fusion as it takes place outside the neural network.
The typical architectures for deep learning-based sensor fusion are early fusion, late
fusion and halfway fusion, as shown in Figure 12. Early fusion is also called feature-
level fusion, in which visible and thermal images are combined together as a 4-channel
(Red, Green, Blue, Intensity (RGBI)) input for the deep learning network to learn the
relationship between the image sources. The late fusion is also called decision-level, in
which feature extraction of visible and thermal images happens separately into subnet-
works and fused just before the object classification layer. Halfway fusion is another
approach which involves feeding the visible and thermal information separately into
the same network and the fusion happens inside the network itself. There are various
studies which demonstrated the benefits of using multispectral detection techniques,

Type Examples Advantages Trade-Offs

Single stage SSD, YOLO Higher speeds Information loss, Large number of False Positives

Two-stage R-CNN, R-FCN Increased Accuracy Slower Speeds

Faster R-CNN Information rich Complex computation

Table 4.
Deep learning-based object detection types.

17
Digital Image Processing – Latest Advances and Applications

Figure 12.
Sensor fusion techniques [35].

which produced the best results while combining the visible and thermal images.
However, during night conditions thermal cameras performed much better than the
fusion data. At low light conditions, fused data performed worse with an Average Miss
rate of 3% and an overall decrease of 5% during daytime is observed. Also, the usage of
multiple sensors causes an increase in system complexity due to differences in sensor
positions, alignment, synchronisations and resolutions of the cameras used [35].
By far, the halfway fusion is considered the most effective of the other two tech-
niques with 3.5% lower miss rate. Also, using the stand-alone visible and thermal
information was shown to be performing much worse than halfway fusion by 11% [44].
18
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Wagner et al. [45] investigated finding the optimal fusion techniques for pedestrian
detection using Faster R-CNN and KAIST datasets and found that the multispectral
information with single-stage and halfway fusion can achieve better performance.
Similarly, there are various studies conducted highlighting the importance of detec-
tion stages and sensor fusion techniques at various lighting conditions to draw similar
conclusions [46]. However, these deep learning models should estimate the bounding
box of the object appropriately and calculate the probability of the class it belongs via
neural networks, which makes them unsuitable for real-time applications [40].
DenseFuse, a deep learning architecture to extract more useful features proposed
by Li et al. [47] uses a combination of CNN, fusion layers and dense blocks, which
creates a reconstructed fused image that outperforms all the existing fusion methods.
SeAFusion network combines the image and semantic segmentation information and
uses gradient residual blocks to enhance image fusion process [48]. An unsupervised
fusion network called U2Fusion was proposed by Xu et al. [49] to best estimate
the fusion process considering the source image importance. Similarly, multiple image
fusion networks were proposed such as end-to-end fusion network (RFN-Net), the
effective biletaral mechanism (BAM), bilateral ReLU residual network (BRRLNet),
etc. [50, 51]. Despite so much progress in deep learning-based fusion architectures,
they are no lightweight real-time running application, they are all limited by appropri-
ate hyper-parameter selection and significant memory utilization. The advantages and
disadvantages of image fusion model-related literatures are summarized in Table 5.

6.1.4 Real-time object detection

In YOLO framework [40], both creating the bounding box and classification of
image are dealt as a single regressive problem to enhance the inference speed and train
the neural network as a whole task. YOLO creates a grid of mxn of the input image
then predicts the N number of bounding boxes and estimates the confidence score on
each Bounding Boxes (BB) using the CNN. Each BB consists of (x,y) central coordi-
nates, (w,h) is its width and height along with class probability value. The intersection
of union (IOU) is calculated based on the overlap of the detected BB and the actual
ground truth (GT). The width of the IOU indicates how accurately the BB is predicted.
The probability of the BB is expressed as the multiplication of probability of the object
and the width of the IOU. The central coordinates of the predicted BB and the GT
exist in the IOU then it is assumed that successful detection and Pr(Object) is set to 1,
else it is set to 0. If there are i number of classes that could be classified as Pr(Class |
object. The BB with the highest probability of the classified object among all possible N
numbers of BBs is considered as the best fit BB of the concerned object.

Literature Advantages Disadvantages

DenseFuse [47] Extract useful features Loss of contrast and brightness

SeAfuseion [48] Combines fusion and semantic segmentation Cannot handle complex scenes

U2Fusion [49] Adapts to new fusion tasks Not robust to noise

Y-shaped net [50] Extracts local features and context info Introduce artifacts or blur

RFN-Net [51] Two-stage training strategy Large amount of training data and time

Table 5.
Summary of image fusion model-related literature.

19
Digital Image Processing – Latest Advances and Applications

Yoon and Cho [52] have proposed a multimodel YOLO-based object detection
method based on late fusion using non-maximum suppression to efficiently extract
the features of an object using color information from visible cameras and bound-
ary information from thermal cameras. The architectural block diagram is shown in
Figure 13. The non-maximum suppression is generally employed towards the second
half of the detection model to improve the object detection performance of models
like YOLO and SSD.
Further, they have also proposed an improved deep multimodel object detec-
tion strategy by introducing dehazing network to enhance the performance of the
model during reduced visibility. The dehaze network constitutes evaluation of haze
level classification, light scattering coefficient estimation from visible images and
depth estimation from thermal images. Detailed performance metrics for dense haze
condition results from Yoon and Cho [52] are presented in Table 6 based on YOLO
trained for (a) visible (b) IR/thermal (c) visible and IR and (d) visible, IR and dehaze
network. Examples of output images from Yoon and Cho [52] of the vehicle detec-
tion results based on YOLO model trained for visible alone, IR/thermal alone, fused
visible and IR and fused visible, IR and dehaze model are shown in Figure 14. Missed
detection is marked as red box and correct detection is marked as blue box. The
performance of vehicle detection model improved from 81.11% to 84.02% of accuracy
from fusion model, but badly impacted by the run time, hence dehaze model is unfit
for real-time applications.

6.1.5 Real-time pedestrian detection

Chen et al. [53] proposed a thermal based R-CNN model for pedestrian detection
using VGG-16 as a backbone network as it has good network stability which enables
the integration of any new branch network. To address the pedestrian occlusion prob-
lem, they have proposed a part model architecture with new aspect ratio and block
model to strengthen the network’s generalization. The presence and resemblance of
pedestrian will be completely lost if the occlusion rate is over 80%, henceforth, train-
ing pedestrians occlusion <80% only be considered. Figure 15 represents the possible
types of pedestrian occlusion and the ground truth labelling and possible detection
are represented as green and red rectangle boxes, respectively. Figure 16 shows the

Figure 13.
Block diagram of the multimodal YOLO-based object detection method based on late fusion [52].

20
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Metrics Visible IR/Thermal Visible+IR Visible+IR + Dehaze

Accuracy 61.55 78.6 81.11 84.02

TP 1264 1626 1683 1747

FP 30 182 238 269

Precision 0.97 0.90 0.87 0.86

Recall 0.61 0.79 0.82 0.85

Run time (ms) 27.82 27.82 27.89 686.99

Table 6.
Performance of vehicle detection model during dense haze condition based on YOLO trained for (a) visible (b)
IR/thermal (c) visible and IR and (d) visible, IR and dehaze network. Results from [52].

Figure 14.
Examples of vehicle detection results based on YOLO model trained for (a) visible (b) IR/thermal (c) visible and
IR and (d) visible, IR and dehaze network. Missed detection is marked in red box and correct detection is marked
in blue box. Results from [52].

architecture of the thermal R-CNN fusion model for improved Pedestrian detection
proposed by [53], which constitutes a full body and region decomposition branch to
extract the full body features of pedestrian and segmentation head branch to extract
the individual pedestrian from the crowded scenes. The loss function is defined
considering five loss components such as BB loss, classification loss, segmentation
loss, pixel level loss and fusion loss.
In Table 7, the performance comparison results from [53] for thermal R-CNN
fusion pedestrian detection model with state-of-the-art deep learning models
found that the thermal R-CNN fusion model is effective and performing better. The
thermal R-CNN fusion model is sensitive to regional features which may easily tend
to misjudge the images, however, the semantic segmentation feature enhances the
information of the complete pedestrian bounding box and the final output is accurate
resulting in higher precision. Figure 17 shows the results from [53], the example
images to demonstrate the improved pedestrian detection by the thermal R-CNN
fusion model compared to ground truth and benchmarked modified R-CNN model. It
is evident from the results that modified R-CNN results partially detect the occluded
pedestrians and in some cases, double partial bounding boxes are created for single
21
Digital Image Processing – Latest Advances and Applications

Figure 15.
Illustration of types of pedestrian occlusion. (a) the green BB rectangle represents full pedestrian and red BB
rectangle represents detectable pedestrians. (b) Top six types of pedestrian occlusion types [53].

Figure 16.
The architecture of the thermal R-CNN fusion model for improved pedestrian detection [53].

Metrics Faster R-CNN Mask R-CNNl YOLO Thermal R-CNN fusion

TP 992 1050 1250 1317

FP 740 720 954 334

FN 995 937 734 670


Precision 0.572 0.593 0.567 0.797

Recall 0.499 0.528 0.630 0.662

F1-score 0.533 0.558 0.661 0.724

Table 7.
Performance comparison of various pedestrian detection models. Results from [53].

pedestrians, whereas this issue is addressed and the pedestrian detection is improved
in the thermal R-CNN fusion results.

6.1.6 Knowledge distillation

Knowledge distillation is widely known as a student-teacher approach for model


compression where a student network is trained to match a large pre-trained teacher
22
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Figure 17.
Example images to demonstrate the improved pedestrian detection by (c) the thermal R-CNN fusion model
compared to (a) ground truth and (b) modified R-CNN. Results from [53].

network. Here, the information is transferred from teacher network to the student
network by reducing the loss function as well as GT labels. Chen et al. [54] proposed
to use the low-level feature from the teacher network to supervise and train the
deeper features for the student network resulting in improved performance. To
address the low-resolution issues of IR-visible fused images, Xiao et al. [55] intro-
duced a heterogeneous knowledge distillation network with multi-layer attention
embedding. This technique consists of teacher network with high resolution fusion
and a student network with low-resolution and super-resolution fusion. The percep-
tual distillation method used to train the image fusion neural networks without GTs is
proposed by Liu et al. [56] to train the teacher network and a multi-autoencoder with
self-supervision for a student network. Furthermore, there are multiple such studies
conducted in recent times summarized in Table 8.

6.1.7 Adaptive mechanism

Over the years, many researchers proposed to incorporate adaptive mechanisms


for image fusion networks as they enable the system to adjust its behavior according
to varying environmental or operating conditions. For a better fusion effect, Xia
et al. [59] introduced a parameter adaptive pulse coupled neural network, and Kong
et al. [60] proposed an adaptive normalization mechanism-based fusion method. To
extract features and to estimate similarity Leu et al. [61] integrated a new strategy
23
Digital Image Processing – Latest Advances and Applications

Literature Advantages Disadvantages

Original knowledge distillation [54] Compresses a large model Loses information

Cross-stage connection path [55] Uses low-level features to Increases the complexity
supervise deeper features

Heterogeneous [56] Joint fusion and Super-resolution Depends on Teacher network


quality

Perceptual distillation [57] Trains image fusion networks depends on teacher network
without ground truths quality

Depth-distilled multi-focus [58] Transfers depth knowledge to Depends on accuracy of depth


image fusion improve fusion accuracy knowledge

Table 8.
Summary of knowledge-distillation network-related literature [57].

for image retrieval using adaptive features. In brief, this adaptive feature mechanism
blends the effectiveness of multiple features and outperforms the single feature
retrieval techniques. This can enhance the image fusion precision, reduce the noise
interference and enhance the real-time performance of the network. Conversely,
there have been various adaptive mechanisms such as adaptive selection loss function,
activation and sampling functions proposed to optimize the deep learning neural
networks. The advantages and disadvantages of adaptive mechanism-related litera-
ture are summarized in Table 9.

6.1.8 Generative adversarial network

Ma et al. [65] introduced the generative adversarial network (GAN) based


generation countermeasure network to the field of image fusion, making the gen-
erators synthesize the fused images from visible and thermal cameras with good
textures using discriminators. A typical GAN-based image fusion framework is
represented as shown in Figure 18. They have also proposed loss of detail and edge
enhancement loss to enhance the quality of the minute details and sharpen the

Literature Advantages Disadvantages

Parameter adaptive pulse- Better image fusion Sensitive to Parameter


coupled neural network [59] settings
Adaptive features and Effective feature extraction good Poor complex scene
information entropy [60] similarity estimation handling

Adaptive normalization Injects detailed features into structured Introduces artifacts or


mechanism-based fusion [61] fusion distortion

Adaptive loss, activation and Optimizes the performance More computational


sampling functions [62] resources

Global group sparse coding [63] Automatic network depth estimation by Suffer from sparsity or
learning inter-layer connections redundancy issues

Novel network structures [64] Outperforms traditional ones Difficulty to design

Table 9.
Summary of adaptive mechanism network-related literature.

24
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Figure 18.
Generative adversarial network based image fusion framework.

edges and features in the target images. Further, they have attempted to improve the
GAN-based framework by including the double discriminator conditional genera-
tion countermeasure network. Li et al. [66] showcased improvement in capturing
the interested region through integrating a multi-scale attention mechanism branch
into the GAN-based Image fusion framework. These networks generate fused images
with excellent quality, which are useful for entertainment and human perception,
However, these networks are not suitable for demanding visual processing tasks like
in automotive applications.

6.1.9 Hybrid model

Hybrid models [67] are created by combining the high-performing models in


specific scenarios to enhance the performance in generic situations. Hybrid models
generally include multi-scale transformation with expression detection, saliency
detection, sparse representation or pulse-coupled neural networks. These hybrid
models effectively improve the performance of fused models by enhancing the clarity
and texture features of the fused images, whereas the model complexity and compu-
tational cost should be majorly considered while designing such models.

7. Conclusions

Level 3 and above ADAS and AV systems demand accurate and precise perception
of the surrounding environment in order to drive the vehicle autonomously. This
can be achieved using multiple sensors of different modalities such as vision, lidars
and radars. ADAS and AV systems provided by various OEMs and Tier 1 companies
show a lack of performance in extreme weather and lighting conditions, especially
dark scenarios, sun glare, rain, fog and snow. The performance can be improved
by adding another sensor to existing sensor suite that can provide complementary
and redundant information during the extreme environments. The properties and
characteristics of Infrared sensors look promising as it is capable of detecting the
natural emission of IR radiations and representing that as an image which indicates
relative temperature map. Recent advancements in AI paved the way to establish
efficient algorithms that can detect and classify objects more accurately irrespective
of weather and lighting conditions. Also, it is capable of detecting objects at long
distances compared to other sensors. Literature shows that fusing the information
25
Digital Image Processing – Latest Advances and Applications

from IR sensor with other sensor data results more precisely thereby ensuring the
path towards autonomous driving. The research community looks for more datasets
available for RGB images for quick and easy deployment of IR in ADAS and AV appli-
cations. Integration in the automotive domain is challenging currently as IR camera
needs a separate setup for calibration and is an expensive technology due to sensor
array. More intense research in IR technology and deep learning models may be highly
beneficial to make use of IR cameras effectively in ADAS and AV systems.

Abbreviations

ADAS advanced driver assistance system


AV autonomous vehicle
AD autonomous driving
NHTSA National Highway Traffic Safety Administration
OEM original equipment manufacturer
SAE society of automotive engineers
2D/3D two dimenional/3 dimensional
GPS global positioning system
IMU inertial measurement unit
IR infrared
NIR near infrared
SWIR short wave infrared
FLIR forward looking infrared
MWIR mid wave infrared
LWIR long wave infrared
Lidar light detection and ranging
Radar radio wave detection and ranging
FOV field of view
AI artificial intelligence
VRU vulnerable road user
ROI region of interest
BB bounding box
GT ground truth
IoU intersection over union
CNN convolutional neural network
DL deep learning
GAN generative adversarial network
RGBI red, green, blue, intensity
VRU vulnerable road user
HOG histogram of oriented gradients
LBP local binary patterns
R-CNN region-CNN
R-FCN regional-fast convolution network
SSD single shot detector
YOLO you only look once

26
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683
Digital Image Processing – Latest Advances and Applications

References

[1] Williams M. The drive for autonomous proliferation-of-sensors-in-next-gen-


vehicles: The DARPA grand challenge. automobiles-is-raising-the-b [Accessed:
Available from: https://www.herox.com/ September 28, 2023]
blog/159-the-drive-for-autonomous-
vehicles-the-darpa-grand [9] Ondrus J, Kolla E, Vertaä P, Åaric Å.
How do autonomous cars work?
[2] US Department of Transportation. Transportation Research Procedia.
Standing general order on crash 2020;44:226-233
reporting: For incidents involving
ADS and level 2 ADAS. Jun 2022. [10] Thakur R. Infrared sensors
Available from: https://www. for autonomous vehicles. Recent
nhtsa.gov/laws-regulations/ Development in Optoelectronic Devices.
standing-general-order-crash-reporting 29 Aug 2018;84

[3] Kukkala VK, Tunnell J, Pasricha S, [11] Mohammed AS, Amamou A,


Bradley T. Advanced driver-assistance Ayevide FK, Kelouwani S, Agbossou K,
systems: A path toward autonomous Zioui N. The perception system of
vehicles. IEEE Consumer Electronics intelligent ground vehicles in all
Magazine. 2018;7(5):18-25 weather conditions: A systematic
literature review. Sensors. 2020;20:6532.
[4] Moller DP, Haas RE. Advanced driver DOI: 10.3390/s20226532
assistance systems and autonomous driving.
In: Guide to Automotive Connectivity [12] ASTM G173-03. Standard Tables for
and Cybersecurity: Trends, Technologies, Reference Solar Spectral Irradiances:
Innovations and Applications. Cham: Direct Normal and Hemispherical on
Springer; 2019. pp. 513-580 37deg Tilted Surface. American Society for
Testing Materials; 2012. Available from:
[5] Rosique F, Navarro PJ, Fernãndez C, https://www.astm.org/g0173-03r20.html
Padilla A. A systematic review of
perception system and simulators for [13] Uber Accident. 2018. Available
autonomous vehicles research. Sensors. from: https://en.wikipedia.org/wiki/
2019;19(3):648 Death_of_Elaine_Herzberg

[6] Odukha O. How sensor fusion for [14] Image Engineering. Challenges for
autonomous cars helps avoid deaths on cameras in automotive applications.
the road. Intellias; Aug 2023. Available Feb 2022. Available from: https://www.
from: https://intellias.com/sensor-fusion- image-engineering.de/library/blog/
autonomous-cars-helps-avoid-deaths-road/ articles/1157-challenges-for-cameras-in-
automotive-applications
[7] Ma Y, Wang Z, Yang H, Yang L.
Artificial intelligence applications in the [15] Why ADAS and autonomous vehicles
development of autonomous vehicles: A need thermal infrared cameras. 2018.
survey. IEEE/CAA Journal of Automatica Available from: https://www.flir.com/
Sinica. 2020;7(2):315-329 [Accessed: September 25, 2023]

[8] Website blog. Available from: [16] Minkina W, Dudzik S. Infrared


https://interplex.com/trends/ Thermography: Errors and Uncertainties.
28
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

Hoboken, New Jersey, United States: John of Thermal Imaging in Automotive


Wiley & Sons; 2009 Applications: A Critical Review. IEEE
Access; 2023
[17] Vollmer M. Infrared thermal imaging.
In: Computer Vision: A Reference Guide. [24] Shahriar N. What is convolutional
Cham: Springer International Publishing; neural network – CNN (Deep Learning).
2021. pp. 666-670 Available from: https://nafizshahriar.
medium.com/what-is-convolutional-
[18] Teledyne FLIR commercial System. neural-network-cnn-deep-learning-
The Ultimate Infrared Handbook for b3921bdd82d5
R & D Professionals. 2018. Available
from: https://www.flir.com/ [Accessed: [25] Farooq MA, Shariff W,
September 25, 2023] Khan F, Corcoran P, Rotariu C. C3I
thermal automotive dataset. IEEE
[19] Li Y, Moreau J, Ibanez-Guzman J. Dataport; 2022. DOI: 10.21227/
Emergent visual sensors for autonomous jf21-rt22. Available from: https://
vehicles. IEEE Transactions on ieee-dataport.org/documents/
Intelligent Transportation Systems. c3i-thermal-automotivedataset
2023;24(5):4716-4737. Available
from: https://ieeexplore.ieee.org/ [26] Torabi A, Masse G, Bilodeau G-A.
document/10092278 An iterative integrated framework for
thermal visible image registration, sensor
[20] Nicolas Pinchon M, Ibn-Khedher OC, fusion, and people tracking for video
Nicolas A, Bernardin F, et al. All-weather surveillance applications. Computer
vision for automotive safety: Which Vision and Image Understanding.
spectral band?. SIA Vision 2016. In: 2021;116(2):210-221
International Conference Night Drive
Tests and Exhibition, Oct 2016, Paris, [27] Chen Y, Shin H. Pedestrian detection
France. at night in infrared images using an
attention-guided encoder-decoder
[21] Hwang S, Park J, Kim N, Choi Y, convolutional neural network. Applied
So Kweon I. Multispectral pedestrian Sciences. 23 Jan 2020;10(3):809
detection: Benchmark dataset and
baseline. In: Proceedings of the IEEE [28] Wu Z et al. A thermal infrared
Conference on Computer Vision and video benchmark for visual analysis.
Pattern Recognition. 2015. pp. 1037-1045. In: Proceedings of the IEEE Conference
Available from: https://soonminhwang. on Computer Vision and Pattern
github.io/rgbt-ped-detection/ Recognition Workshops. Columbus,
Ohio: IEEE; 2014. pp. 201-208
[22] Nicolas Pinchon M, Ibn-Khedher OC,
Nicolas A, Bernardin F, et al. All-weather [29] Krišto M, Ivašić-Kos M.
vision for automotive safety: Which Thermal imaging dataset for person
spectral band? In: SIA Vision 2016 - detection. In: 2019 42nd International
International Conference Night Drive Convention on Information and
Tests and Exhibition, Paris, France. 2016. Communication Technology, Electronics
p. 7. Available from: https://hal.science/ and Microelectronics (MIPRO). FLIR
hal-01406023/document thermal dataset; 20 May 2019.
pp. 1126-1131. Available from:
[23] Farooq MA, Shariff W, O’Callaghan D, https://www.flir.com/oem/adas/
Merla A, Corcoran P. On the Role adas-dataset-form/
29
Digital Image Processing – Latest Advances and Applications

[30] Hwang S, Park J, Kim N, Choi Y, [36] Girshick R, Donahue J, Darrell T,


Kweon IS. Multispectral pedestrian Malik J. Rich feature hierarchies for
detection: Benchmark dataset and accurate object detection and semantic
baseline. In: Proceedings of the IEEE segmentation. In: Proceedings of the
Conference on Computer Vision and EEE Conference on Computer Vision and
Pattern Recognition. Boston, MA, USA: Pattern Recognition (CVPR), Columbus,
IEEE; 2015. pp. 1037-1045 OH, USA, 24-27 June 2014. Columbus,
Ohio: IEEE; 2014. pp. 580-587
[31] Farooq MA, Shariff W, Ocallaghan D,
Merla A, Corcoran P. On the Role [37] Dai J, Li Y, He K, Sun J. R-FCN:
of Thermal Imaging in Automotive Object detection via region-based fully
Applications: A critical Review. Vol.11. convolutional networks. In: Proceedings
IEEE Access; 2023. pp. 25152-25173. of the IEEE conference on Advances
Available from: https://ieeexplore.ieee.org/ in Neural Information Processing,
document/10064306 Barcelona, Spain. Spain: IEEE; 2016.
pp. 379-387
[32] Solichin A, Harjoko A, Eko A.
A survey of pedestrian detection [38] Ren S, He K, Girshick R, Sun J.
in video. International Journal of Faster R-CNN: Towards real-time object
Advanced Computer Science and detection with region proposal networks.
Applications. 2014:5. DOI: 10.14569/ Advances in Neural Information
IJACSA.2014.051007. Available from: Processing Systems. 2015;39:1137-1149
https://thesai.org/Publications/ViewPap
er?Volume=5&Issue=10&Code=ijacsa& [39] Liu W, Anguelov D, Erhan D,
SerialNo=7 Szegedy C, Reed S, Fu CY, et al. SSD:
Single shot multibox detector. In:
European Conference on Computer
[33] Chavez-Garcia RO, Aycard O.
Vision. Cham, Switzerland: Springer;
Multiple sensor fusion and classification
2016. pp. 21-37
for moving object detection and tracking.
IEEE Transactions on Intelligent
[40] Redmon J, Divvala S, Girshick R,
Transportation Systems. 2016;17:525-534.
Farhadi A. You only look once: Unified,
Available from: https://ieeexplore.ieee.
real-time object detection. arXiv 2015,
org/document/7283636
arXiv:1506.02640

[34] Wang X, Han TX, Yan S. An [41] Geronimo D, Lopez AM, Sappa AD,
HOG-LBP human detector with partial Graf T. Survey of pedestrian detection for
occlusion handling. In: Proceedings of advanced driver assistance systems. IEEE
the IEEE 12th International Conference Transactions on Pattern Analysis and
on Computer Vision, Kyoto, Japan, 29 Machine Intelligence. 2010;32:1239-1258
September-2 October 2009. Japan: IEEE;
2009. pp. 32-39. Available from: https:// [42] Enzweiler M, Gavrila DM.
ieeexplore.ieee.org/document/5459207 Monocular pedestrian detection: Survey
and experiments. IEEE Transactions
[35] Ahmed S, Huda MN, Rajbhandari S, on Pattern Analysis and Machine
Saha C, Elshaw M, Kanarachos S. Intelligence. 2009;31:2179-2195
Pedestrian and cyclist detection and
intent estimation for autonomous [43] Dolã P, Wojek C, Schiele B, Perona P.
vehicles: A survey. Applied Sciences. Pedestrian detection: An evaluation of
2019;9:2335. DOI: 10.3390/app9112335 the state of the art. IEEE Transactions
30
Latest Advancements in Perception Algorithms for ADAS and AV Systems Using Infrared Images…
ITexLi.1003683

on Pattern Analysis and Machine [51] Hui L, Xjw A, Jk B. RFN-Nest: An


Intelligence. 2011 end-to-end residual fusion network for
infrared and visible images. Information
[44] Hou YL, Song Y, Hao X, Shen Y, Fusion. 2021;73:72-86
Qian M. Multispectral Pedestrian
Detection Based on Deep Convolutional [52] Yoon S, Cho J. Deep multimodal
Neural Networks. In: Proceedings of detection in reduced visibility
the IEEE International Conference on using thermal depth estimation
Signal Processing, Communications and for autonomous driving. Sensors.
Computing (ICSPCC), Xiamen, China. 2022;22:5084. DOI: 10.3390/s22145084
2017. pp. 22-25
[53] Chen Y, Shin H. Pedestrian detection
[45] Wagner J, Fischer V, Herman M. at night in infrared images using an
Multispectral pedestrian detection attention-guided encoder-decoder
using deep fusion convolutional convolutional neural network. Applied
neural networks. In: Proceedings of Sciences. 2020;10:809. DOI: 10.3390/
the European Symposium on Artificial app10030809
Neural Networks, Bruges, Belgium.
Belgium: ESANN; 2016. pp. 27-29 [54] Chen P, Liu S, Zhao H, Jia J.
Distilling knowledge via knowledge
[46] Du X, El-Khamy M, Lee J, Davis L. review. In: Proceedings of the IEEE/CVF
Fused DNN: A deep neural network Conference on Computer Vision and
fusion approach to fast and robust Pattern Recognition (CVPR), Nashville,
pedestrian detection. In: Proceedings TN, USA. IEEE; 2021. pp. 5006-5015
of the 2017 IEEE Winter Conference on
Applications of Computer Vision, WACV [55] Xiao W, Zhang Y, Wang H, Li F,
2017, Santa Rosa, CA, USA. CA, USA: Jin H. Heterogeneous knowledge
IEEE; 2017. pp. 953-961 distillation for simultaneous infrared-
visible image fusion and super-
[47] Li H, Wu XJ. DenseFuse: A fusion resolution. IEEE Transactions on
approach to infrared and visible images. Instrumentation and Measurement.
IEEE Transactions on Image Processing. 2022;71:1-15
2018;28:2614-2623
[56] Liu X, Hirota K, Jia Z, Dai Y. A multi-
[48] Tang L, Yuan J, Ma J. Image fusion autoencoder fusion network guided
in the loop of high-level vision tasks: by perceptual distillation. Information
A semantic-aware real-time infrared Sciences. 2022;606:1-20
and visible image fusion network.
Information Fusion. 2022;82:28-42 [57] Zhao Z, Su S, Wei J, Tong X, Gao W.
Lightweight infrared and visible image
[49] Xu H, Ma J, Jiang J, Guo X, Ling H. fusion via adaptive DenseNet with
U2Fusion: A unified unsupervised image knowledge distillation. Electronics.
fusion network. IEEE Transactions 2023;12:2773. DOI: 10.3390/
on Pattern Analysis and Machine electronics12132773
Intelligence. 2020;44:502-518
[58] Mi J, Wang L, Liu Y, Zhang J. KDE-
[50] Tang W, He F, Liu Y. YDTR: Infrared GAN: A multimodal medical image-
and visible image fusion via Y-shape fusion model based on knowledge
dynamic transformer. IEEE Transactions distillation and explainable AI modules.
on Multimedia. 2023;25:5413-5428. Computers in Biology and Medicine.
DOI: 10.1109/TMM.2022.3192661 2022;151:106273
31
Digital Image Processing – Latest Advances and Applications

[59] Xia J, Lu Y, Tan L. Research of [67] Ma W, Wang K, Li J, Yang SX, Li J,


multimodal medical image fusion based Song L, et al. Infrared and visible image
on parameter-adaptive pulse-coupled fusion technology and application:
neural network and convolutional sparse A review. Sensors. 2023;23:599.
representation. Computational and DOI: 10.3390/s23020599
Mathematical Methods in Medicine.
2020;2020:3290136

[60] Lu X, Zhang L, Niu L, Chen Q ,


Wang J. A novel adaptive feature fusion
strategy for image retrieval. Entropy.
2021;23:1670

[61] Wang L, Hu Z, Kong Q , Qi Q ,


Liao Q. Infrared and visible image fusion
via attention-based adaptive feature
fusion. Entropy. 2023;25:407

[62] Zeng S, Zhang Z, Zou Q. Adaptive


deep neural networks methods for
high-dimensional partial differential
equations. Journal of Computational
Physics. 2022;463:111232

[63] Yuan J, Pan F, Zhou C, Qin T,


Liu TY. Learning Structures for deep
neural networks. 27 May 2021. arXiv
arXiv:2105.13905

[64] Li H, Yang Y, Chen D, Lin Z.


Optimization algorithm inspired deep
neural network structure design. In:
Asian Conference on Machine Learning.
PMLR; 4 Nov 2018. pp. 614-629. arXiv
2018, arXiv:1810.01638

[65] Ma J, Yu W, Liang P, Li C, Jiang J.


FusionGAN: A generative adversarial
network for infrared and visible
image fusion. Information Fusion.
2019;48:11-26

[66] Li J, Huo H, Li C, Wang R, Feng Q.


AttentionFGAN: Infrared and visible
image fusion using attention-based
generative adversarial networks.
IEEE Transactions on Multimedia.
2021;23:1383-1396

You might also like