0% found this document useful (0 votes)
17 views9 pages

Algorithm and hardware implementation for visual perception system in autonomous vehicle

This paper surveys advancements in visual perception algorithms and hardware implementations for autonomous vehicles, focusing on key applications such as vehicle and pedestrian detection, lane detection, and drivable surface detection. It discusses various hardware platforms including CPU, GPU, FPGA, and ASIC that support real-time operations of these algorithms. The paper also highlights technical challenges and the importance of accurate visual perception for safe autonomous driving.

Uploaded by

Rohit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

Algorithm and hardware implementation for visual perception system in autonomous vehicle

This paper surveys advancements in visual perception algorithms and hardware implementations for autonomous vehicles, focusing on key applications such as vehicle and pedestrian detection, lane detection, and drivable surface detection. It discusses various hardware platforms including CPU, GPU, FPGA, and ASIC that support real-time operations of these algorithms. The paper also highlights technical challenges and the importance of accurate visual perception for safe autonomous driving.

Uploaded by

Rohit Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

INTEGRATION the VLSI journal 59 (2017) 148–156

Contents lists available at ScienceDirect

INTEGRATION, the VLSI journal


journal homepage: www.elsevier.com/locate/vlsi

Algorithm and hardware implementation for visual perception system in MARK


autonomous vehicle: A survey

Weijing Shia, Mohamed Baker Alawieha, Xin Lib,c, , Huafeng Yud
a
Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
b
Electrical and Computer Engineering Department, Duke University, Durham, NC 27708, USA
c
Institute of Applied Physical Sciences and Engineering, Duke Kunshan University, Kunshan, Jiangsu 215316, China
d
Boeing Research and Technology, Huntsville, AL 35758, USA

A R T I C L E I N F O A BS T RAC T

Keywords: This paper briefly surveys the recent progress on visual perception algorithms and their corresponding
Algorithm hardware implementations for the emerging application of autonomous driving. In particular, vehicle and
Hardware pedestrian detection, lane detection and drivable surface detection are presented as three important
Autonomous vehicle applications for visual perception. On the other hand, CPU, GPU, FPGA and ASIC are discussed as the major
Visual perception
components to form an efficient hardware platform for real-time operation. Finally, several technical challenges
are presented to motivate future research and development in the field.

1. Introduction autonomous vehicles? What laws should be built to regulate autono-


mous vehicles for their integration into our society? How do we handle
The last decade has witnessed tremendous development of auton- the accidents with insurance involving autonomous vehicles? The
omous and intelligent systems: satellites in space, drones in air, recent reports on artificial intelligence [2,3] may be good references
autonomous vehicles on road, and autonomous vessels in water. for thinking and addressing these questions.
These autonomous systems aim at progressively taking over the Different from conventional vehicles, autonomous vehicles are
repeated, tedious and dangerous operations by human, especially in equipped with new electrical and mechanical devices for environment
an extreme environment. With the Grand Challenge and Urban perception, communication, localization and computing. These devices
Challenge of autonomous vehicles, organized by the Defense include radar, LIDAR, ultra-sonic, GPS, camera, GPU, FPGA, etc. They
Advanced Research Projects Agency (DARPA) in 2004 and 2007 also integrate new information processing algorithms of machine learning,
respectively [1], autonomous vehicles and their enabling technologies signal processing, encryption/decryption and decision making. The auton-
have received broad interests as well as investments from both omy level [4] of these autonomous vehicles is finally determined by the
academia and industry. After these challenges, major developments combination of all devices and algorithms at different levels of maturity.
have been quickly switched from academic research to industrial In the literature, a number of these new devices and technologies
commercialization. Automotive Original Equipment Manufacturers were first integrated into vehicles as the enablers for Advanced Driver
(OEMs) such as GM, BMW, Tesla, Daimler and Nissan, tier-one Assistant System (ADAS). However, ADAS only provides simple and
suppliers such as Bosch, Denso and Delphi, as well as software partial autonomous features at low levels of autonomy. Yet, ADAS has
companies such as Google, Uber and Baidu, have progressively joined been proved to be valuable in improving vehicle safety. Examples of
the global competition for self-driving technology. Many have already ADAS include lane departure warning system, adaptive cruise control,
noticed Google self-driving cars in Mountain View, CA and Austin, TX, blind spot monitor, automatic parking, etc. These systems generally
and Uber cars in Pittsburg, PA for road testing. work with the conventional vehicle E/E (Electrical/Electronic) archi-
The revolution of autonomous driving brings up a lot of discussions tecture, and do not require any major modification of vehicle archi-
on various issues related to society, policy, legislation, insurance, etc. tecture. ADAS has been extensively adopted for today's commercial
For instance, how would the society accept autonomous vehicles when vehicles with low cost.
their behavior is still unknown and/or unpredictable? How should the On the other hand, an increasing number of companies are
policy be made to accelerate the development and deployment of extremely interested in research and development of high-level


Corresponding author at: Electrical and Computer Engineering Department, Duke University, Durham, NC 27708, USA.
E-mail addresses: [email protected] (W. Shi), [email protected] (M.B. Alawieh), [email protected] (X. Li), [email protected] (H. Yu).

http://dx.doi.org/10.1016/j.vlsi.2017.07.007
Received 20 May 2017; Received in revised form 24 July 2017; Accepted 26 July 2017
Available online 29 July 2017
0167-9260/ © 2017 Elsevier B.V. All rights reserved.
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

autonomy. Namely, an autonomous car can drive itself, instead of As the interface between the real world and the vehicle, an accurate
providing driver assistant only. At this level of autonomy, it requires perception system is extremely critical. If inaccurate perception
vehicles to sense the surrounding environment like humans, such as information is used to guide the decision and control system, an
distance of obstacle, signalization, location, moving pedestrians, etc., autonomous car may make incorrect decisions, resulting in poor
as well as making decisions like humans. These requirements lead to driving efficiency or worse, an accident. For example, if the traffic sign
the adoption and integration of a large set of new sensing devices, detection system misses a STOP sign, the vehicle may not make the
information processing algorithms, hardware computing units, and correct decision to stop, thereby leading to an accident.
then lead to new automotive E/E architecture design where safety, Among all perception functions, visual perception is one of the most
security, performance, power consumption, cost, etc., must be care- important components. It interprets visual data from multiple cameras
fully considered. and performs critical tasks such as vehicle and pedestrian detection.
In spite of all the technical and social challenges for adopting Although an autonomous driving system usually has other non-visual
autonomous vehicles, autonomy technologies are being developed with sensors, cameras are essential because they mimic human eyes and
significant investment and at a fast rate [5,6]. Among them, visual most traffic rules are designed by assuming the ability of visual
perception is one of the most critical technologies, as all important perception. For example, many traffic signs share similar physical
decisions made by autonomous vehicles rely on visual perception of the shapes and they are differentiated by their colored patterns that can
surrounding environment. Without correct perception, any decision only be captured by a visual perception system. In the next section, we
made to control a vehicle is not safe. In this paper, we present a brief review several important applications for visual perception and high-
survey on various perception algorithms and the underlying hardware light the corresponding algorithms.
platforms to execute these algorithms for real-time operation. In
particular, machine learning and computer vison algorithms are often 3. Visual perception algorithms
used to process the sensing data and derive an accurate understanding
of the surrounding environment, including vehicle and pedestrian Visual perception is mainly used for detecting obstacles that can be
detection, lane detection, drivable surface detection, etc. Based upon either dynamic (e.g., vehicle and pedestrian) or static (e.g., road curve
the perception outcome, an intelligent system can further make and lane marker). Different obstacles may have dramatically different
decisions to control and manipulate the vehicle. behaviors or represent different driving rules. For example, a road
To meet the competitive requirements on computing for real-time curve defines the strict boundary of the road and exceeding the
operation, special hardware platforms have been designed and im- boundary must be avoided. However, a lane marker defines the “soft”
plemented. Note that machine learning and computer vision algo- boundary of driving lane which vehicles may break if necessary.
rithms are often computationally expensive and, therefore, require a Therefore, it is not sufficient to detect obstacles only. A visual
powerful computing platform to process the data in a timely manner. perception algorithm must accurately recognize the obstacles of inter-
On the other hand, a commercially competitive system must be energy- est. In addition to obstacle detection, visual perception is also used for
efficient and with low cost. In this paper, a number of different possible drivable surface detection, where an autonomous vehicle needs to
choices for hardware implementation are briefly reviewed, including detect possible drivable space even when it is off-road (e.g. in a parking
CPU, GPU, FPGA, ASIC, etc. lot) or when the road is not clearly defined by road markers (e.g. on a
The reminder of this paper is organized as follows. Section 2 forest road). Over the past several decades, a large body of perception
overviews autonomous vehicles and several visual perception algo- algorithms have been developed. However, due to the page limit, we
rithms are summarized in Section 3. Important hardware platforms for will only review a small number of most representative algorithms in
implementing perception algorithms are discussed in Section 4. this paper.
Finally, we conclude in Section 5.
3.1. Vehicle and pedestrian detection
2. Autonomous vehicles
Detecting vehicles and pedestrians lies in the center of driving
As an intelligent system, autonomous car must automatically sense safety. Tremendous research efforts have been devoted to develop
the surrounding environment and make correct driving decisions by accurate, robust and fast detection algorithms. Most traditional detec-
itself. In general, the function components of an autonomous driving tion methods are composed of two steps. First, important features are
system can be classified into three categories: (i) perception, (ii) extracted from a raw image. A feature is an abstraction of image pixels
decision and control, and (iii) vehicle platform manipulation [7]. such as the gradient of pixels or the similarity between a local image
The perception system of an autonomous vehicle perceives the and the designed patterns. They can be considered as the low-level
environment and its interaction with the vehicle. Usually, it covers understanding of a raw image. A good feature can efficiently represent
sensing, sensor fusion, localization, etc. By integrating all these tasks, the valuable information required by detection while robustly tolerat-
we generate an understanding of the external world based on sensor ing the distortions such as rotation of image, variation of illumination
data. Given the perception information, a driving system must make condition, scaling of object, etc. Next, once the features are available, a
appropriate decisions to control the vehicle. The objective is to navigate learning algorithm is applied to further inspect the feature values and
a vehicle by following a planned route to the destination while avoiding recognize the scene represented by the image. By adopting an appro-
collisions with any static or dynamic obstacle. To achieve this goal, the priate algorithm for feature selection (e.g., AdaBoost [8,9]), a small
decision and control functions compute the global route based on a number of important features are often chosen from a large set of
map in its database, constantly plan the correct motion and generate candidates to build an efficient classifier.
local trajectories to avoid obstacles. Histogram of oriented gradients (HoG) [10] is one of the most
Once the driving decision is made, the components for vehicle widely adopted features for object detection. When calculating the HoG
platform manipulation execute the decision and ensure the vehicle to feature, an image is divided into a grid of cells and carefully normalized
act in an appropriate manner. They generate control signals for on local area. The histogram of the image gradients in a local area
propulsion, steering and braking. Since most traditional vehicles have forms a feature vector. The HoG feature is carefully hand-crafted and
already adopted the electrical controlling architecture, manipulation can achieve high accuracy in pedestrian and vehicle detection. It also
units usually do not require any major modification of the architecture. carries relatively low computational cost, which makes it popular in
Additionally, vehicle platform manipulation may cover emergency real-time applications such as autonomous driving. However, the
safety operations in case of system failure. design of hand-crafted features such as HoG requires extensive

149
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

domain-specific knowledge, thereby limiting the successful develop- over which a vehicle cannot drive. Therefore, obstacle detection alone
ment of new features. does not serve as a complete solution. For many autonomous vehicles,
Alternatively, we may generate new features base on existing hand- additional sensors such as LIDAR are deployed to accurately detect
crafted features such as HoG. For instance, the authors of [11] propose drivable surface. These approaches, however, are expensive and there
to add an extra middle layer for feature extraction after computing the is a strong interest in developing other alternative cost-effective
low-level features. The proposed middle layer combines different types approaches.
of low-level features by processing these features with a variety of filter Semantic segmentation is a promising technique to address the
patterns. Learning methods such as realboost are then applied to select aforementioned problem. It labels every pixel in an image with the
the best combinations of low-level features and these combinations object it belongs to. The object may be a car, a building or the road
become the new features. Although the computational cost increases itself. By using semantic segmentation, an autonomous vehicle can
for generating new features, these approaches can achieve higher directly locate the drivable space. The conventional algorithms for
detection accuracy than the conventional methods relying on low-level semantic segmentation adopt random field labeling and the dependen-
features only. cies among labels are modeled by combining features such as color and
More recently, the breakthrough on convolutional neural network texture. However, these conventional algorithms rely on hand-crafted
(CNN) poses a radically new approach where feature extraction is fully features that are not trivial to identify. In [17], a CNN is trained to
integrated into the learning process and all features are automatically extract local features automatically. These features are at multiple
learned from the training data [12]. A CNN is often composed of resolutions and thus are robust to scaling. It has been demonstrated
multiple layers. In a single convolutional layer, the input image is that the CNN approach outperforms other state-of-the-art methods in
processed by a set of filters and the output can be further passed to the the literature. However, adopting CNN results in expensive computa-
following convolutional layers. These filters at all convolutional layers tional cost. For this reason, the authors of [18] reduce the estimation of
are learnt from the training data and such a learning process can be drivable space to an inference problem on a 1-D graph and it uses
conceptually viewed as automatic feature extraction. CNN has been simple and light-weight techniques for real-time feature computation
demonstrated with the state-of-art accuracy for pedestrian detection and inference. Experimental results have been presented to demon-
[12]. However, it is computationally expensive where billions of strate its superior performance even for challenging datasets.
floating-point operations are often required for processing a single
image. To address this complexity issue, fast R-CNN [13] and YOLO 4. Hardware platforms
[14] have been proposed in the literature to reduce computational cost
and, consequently, achieve real-time operation. In the last decade, autonomous vehicles have attracted worldwide
attention. In addition to algorithm research, hardware development is
3.2. Lane detection extremely important for operating an autonomous vehicle in real time.
The Urban Challenge organized by DARPA in 2007 requires each team
Lane detection is an essential component for autonomous vehicles to demonstrate an autonomous vehicle navigating in a given environ-
to drive on both highway roads and urban streets. Failure to correctly ment where complex maneuvers such as merging, passing, parking and
detect a lane may break traffic rules and endanger the safety of not only negotiating intersections are tested [1]. After its great success, auton-
the autonomous vehicle itself but also other vehicles on the road. omous driving has been considered to be technically feasible and,
Today, lanes are mostly defined by the lane markings which can only be consequently, moved to the commercialization phase. For this reason,
detected by visual sensors. Therefore, designing real-time vision various autonomous driving systems are being developed by the
algorithm plays an irreplaceable role in reliable lane detection. industry. Academic researchers are also actively involved in this area
To facilitate safe and reliable driving, lane detection must be to develop novel ideas and methodologies to further improve perfor-
robustly implemented under non-ideal illumination and lane marking mance, enhance reliability and reduce cost.
conditions. In [15], a lane detection algorithm is developed and it is The hardware system of an autonomous vehicle is composed of
able to deal with challenging scenarios such as a curved lane, worn lane sensors (e.g., camera, LIDAR, radar, ultrasonic sensor, etc.), computing
markings, and lane changes including emerging and splitting. The devices and a drive-by-wire vehicle platform [19]. In this section, we
proposed approach adopts a probabilistic framework to combine object first briefly summarize the sensing and computing systems for auton-
recognition and tracking, achieving robust and real-time detection. omous driving demonstrated by several major industrial and academic
However, the approach in [15] relies on motion models of the players. Next, we will further describe the most recent progress in high-
vehicle and requires information from inertial sensors to track lane performance computing devices to facilitate autonomous driving.
markings. It may break down when the motion of a vehicle shows
random patterns. To address this issue, the authors of [16] propose a 4.1. Sensing systems
new approach that characterizes the tracking model by assuming static
lane markings, without relying on the knowledge about vehicle motion. Camera is one of the most critical components for visual perception.
As such, it has demonstrated superior performance for extremely Typically, the spatial resolution of a camera in autonomous vehicle
challenging scenarios during both daytime and nighttime. ranges from 0.3 megapixel to 2 megapixel [20,21]. A camera can
generate the video stream at 10–30 fps and captures important objects
3.3. Drivable surface detection such as traffic light, traffic sign, obstacles, etc., in real time.
In addition to camera, LIDAR is another important sensor. It
One of the fundamental problems in autonomous driving is to measures the distance between vehicle and obstacles by actively
identify the collision-free surface where a vehicle can safely drive. illuminating the obstacles with laser beams [22]. Typically, a LIDAR
Although obstacle detection plays an important role in constraining the system scans the surrounding environment periodically and generates
surface and defining the un-drivable space, it is not sufficient to fully multiple measurement points. This “cloud” of points can be further
determine the drivable space due to two reasons. First, it is extremely processed to compute a 3D map of the surrounding environment [23].
difficult, if not impossible, to detect all possible physical objects in real LIDAR is known to be relatively robust and accurate [24], but it is also
life. Various objects may act as obstacles and not all of them can be expensive.
precisely recognized by a detection algorithm. Second, a number of Alternatively, a stereo camera can be used to interpret the 3D
obstacles may not be described by a physical and well-characterized environment [25]. It is composed of two or more individual cameras.
form. For example, bridge edge and water surface are both obstacles Knowing the relative spatial locations of all individual cameras, a depth

150
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

map can be computed by comparing the difference between multiple

Four ultrasonic sensors


Twelve sonar sensors
images from different cameras. Therefore, the distance of an object in
the scene can be estimated. Generally, a stereo camera is less expensive
than a LIDAR. However, a stereo camera is passive and it is sensitive to
Ultrasonic

environmental artifacts and/or noises posed by bad weather and


illumination.
Besides camera and LIDAR, radar and ultrasonic sensors are also


widely used to detect obstacles. Their detection areas can be short-

Four short-range, four middle-range and four long-range


range and wide-angle, mid-range and wide-angle, and long-range and
narrow-angle [22]. For applications such as crashing detection and
blind spot detection [26], a short detection range of 20–30 m is
One long-range and five middle-range radars

commonly used [27]. For other applications such as cruise control, a


long detection range of 200 m is required [27]. Ultrasonic sensors are
similar to radars, but they use high-frequency sound waves, instead of
radio waves, to detect objects. Both radars and ultrasonic sensors do
not capture the detailed information of an obstacle (e.g., color, texture,
etc.) and cannot classify the obstacle into different categories (e.g.,
vehicle, pedestrian, etc.).
Table 1 summarizes the sensors adopted by today's autonomous
vehicles. Note that most autonomous vehicles integrate multiple types
Three radars

of sensors due to the two important reasons. First, fusing the data
Six radars

from multiple sensors improves the overall perception accuracy. For


radars
Radar

example, a LIDAR system can quickly detect the regions of interest



and a camera system can apply highly accurate object detection


algorithms to further analyze these important regions. Second,
different layers of sensors with overlapped sensing areas provide
Two multilayer laser scanners and four single-layer laser

additional redundancy and robustness to ensure high accuracy and


One 4-layer laser scanner and three single-beam laser

reliability. For instance, when the camera system fails to detect an


incoming vehicle, the radar system can act as a fail-safe and prevent
an accident from happening.

4.2. Computing systems


Two 4-layer laser scanners and
two single-layer laser scanners

For autonomous driving, a powerful computing system is required


to interpret a large amount of sensing data and perform complex
perception functions in real time. To achieve this goal, a variety of
computing architectures have been proposed, such as multicore CPU
Three LIDARs

system [24], heterogeneous system [20], distributed system [29], etc.


Six LIDARs

Table 2 summarizes the major computing systems adopted by several


scanners

scanners
LIDAR

state-of-the-art autonomous vehicles.


As shown in Table 2, most systems are composed of more than one

computing devices. For instance, BMW adopts a standard personal


computer (PC) and a real-time embedded computer (RTEPC) [31].
Seven cameras including four cameras for stereo

Four cameras including two cameras for stereo

They are connected by a direct Ethernet connection. The PC is


Four fish-eye cameras and one stereo camera
Two cameras including one infrared camera

Three cameras including one stereo camera


One color camera and three mono-cameras

connected to multiple sensors and vehicle bus signals through


Ethernet and CAN buses. It fuses the data from all sensors to fully
understand the external environment. The PC also stores a database of
high-precision maps. Meanwhile, the RTEPC is connected to the
actuators by CAN buses for steering, braking and throttle control. It
performs a variety of important functions such as localization, trajec-
tory planning and control.
One mono-camera

A similar system with separated computing devices can also be


found in the autonomous vehicle designed by Stanford [24]. Its
computing system is composed of two multicore CPU servers. A 12-
Camera

core server runs vision and LIDAR algorithms. The other 6-core server
vision
vison

performs planning, control, and low-level communication tasks.


The autonomous vehicle designed by Carnegie Mellon deploys 4
Sensors for autonomous vehicles.

Bertha Benz drive [30]

computing devices and each of them is equipped with one CPU and one
GPU [20]. All computing devices are interconnected by Ethernet
V-Charge [21]

connection to support high computing power for complicated algo-


Junior [24]

BMW [31]

rithms as well as to tolerate possible failure events. In addition, a


VIAC [28]

CMU [20]

A1 [29]
Vehicle

separate interface computing device runs the user application that


controls the vehicle via a touch-screen interface. This idea of using a
cluster of computing devices is shared by the European V-Charge
Table 1

2011

2011

2013
2013
2014

2014

2015
Year

project leaded by ETH Zurich [21], where the computing system is


composed of a cluster of 6 personal computers.

151
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

Table 2
Computing systems for autonomous vehicles.

Year Vehicle Computing system

2011 Junior [24] One Intel Xeon 12-core server and one Intel Xeon 6-core server
2013 CMU [20] One computing device equipped with an Intel Atom Processor D525 and four mini-ITX motherboards equipped with NVIDIA GT530 GPUs and Intel
Core 2 Extreme Processor QX9300s
2013 V-Charge [21] Six personal computers
2014 A1 [29] Two embedded industrial computers, a rapid controller prototyping electronic computing unit and 13 32-bit microcontroller-based electronic computing
units
2015 BMW [31] One standard personal computer and one real-time embedded prototyping computer

The A1 car designed by Hanyang University further distributes its Therefore, a large number of computer vision and deep learning tools
computing functions over more devices [29]. It adopts a distributed such as OpenCV [40] and Caffe [41] have taken advantage of GPU to
computing system consisting of two embedded industrial computers, a improve the throughput.
rapid controller prototyping electronic computing unit and 13 micro- For this reason, GPU has been considered as a promising comput-
controller-based electronic computing units. The two high-perfor- ing device for the application of autonomous driving. However, GPU
mance embedded industrial computers provide high computing power often consumes high energy. For instance, NVIDIA Tesla K40 is used in
to run sensor fusion, planning, and vision algorithms. The rapid [38] and its power consumption is around 235 W. Such a high power
controller prototyping electronic computing unit is particularly de- consumption poses two critical issues. First, it increases the load of
signed for real-time operation and, therefore, is used for time-critical power generation system inside a vehicle. Second, but more impor-
tasks such as vehicle control. The 13 electronic computing units are tantly, it makes heat dissipation extremely challenging because the
used for braking, steering, acceleration, etc. In order to achieve real- environmental temperature inside a vehicle is often significantly higher
time response, those computing devices are placed next to the actuators than the normal room temperature. To address these issues, various
in order to reduce the latency for communication. efforts have been made to design and implement mobile GPUs with
While the aforementioned hardware systems have been successfully reduced power consumption.
designed and adopted for real-time operation of autonomous driving, For instance, NVIDIA has released its newest mobile GPU Tegra ×1
their performance (measured by accuracy, throughput, latency, power, implemented with the TSMC 20 nm technology [42]. It is composed of
etc.) and cost (measured by price) remain noncompetitive for high- a 256-CUDA-Core GPU and two quad-core ARM CPUs, as shown in
volume production of commercial deployment. Hence, radically new Fig. 1. It also contains an end-to-end 4 K 60 fps pipeline which
hardware implementations must be developed to address both the supports high-performance video encoding, decoding and displaying.
technical challenges and the market needs in this field, as will be In addition, it offers a number of I/O interfaces such as USB3.0, HDMI,
further discussed in the next sub-section. serial peripheral interface, etc. The two ARM CPUs are implemented
with different options: (i) a high-performance ARM quad-core A53,
4.3. Computing devices and (ii) a power-efficient ARM quad-core A57. When running a set of
given applications, we can switch between the high-performance and
The aforementioned autonomous vehicles have successfully demon- low-power cores to achieve maximum power efficiency as well as
strated their self-driving capabilities by using conventional computing optimal performance. Tegra ×1 is one of the important chips for the
systems. However, their performance and cost are still noncompetitive DRIVE PX Auto-Pilot Platform marketed by NVIDA for autonomous
for commercial deployment and new computing devices must be driving [42].
developed to improve performance, enhance reliability and reduce At its peak performance, Tegra ×1 offers over 1 T FLOPs for 16-bit
cost. In this sub-section, we review the recent advances in the field operations and over 500 G FLOPs for 32-bit operations. It is designed
driven by major industrial and academic players. to improve power efficiency by optimizing its computing cores,
reorganizing its GPU architecture, improving memory compression,
4.3.1. Graphics processing units and adopting 20 nm technology. While a conventional GPU consumes
Graphics processing unit (GPU) is conventionally designed and
used for graphic processing tasks. Over the past decades, the advance
of GPU has been driven by the real-time performance requirements for
complex and high-resolution 3D scenes in computer games where
tremendous parallelism is inherent [32]. Today, general-purpose
graphics processing unit (GPGPU) is also widely used for high-
performance computing (HPC). It has been demonstrated with pro-
mising performance for scientific applications such as cardiac bido-
main simulation [33], biomolecular modeling [34], quantum Monte
Carlo [35], etc. A GPU contains hundreds or even thousands of parallel
processors and can achieve substantially higher throughput than CPU
when running massively parallel algorithms. To reduce the complexity
of GPU programming, multiple parallel programming tools such as
CUDA [36] and OpenCL [37] have been developed in the literature.
Many computer vision and machine learning algorithms used for
automotive perception are inherently parallel and, therefore, fit the
aforementioned GPU architecture. For example, convolutional neural
network (CNN) is a promising technique for autonomous perception
[38,39]. The computational cost of evaluating a CNN is dominated by
the convolution operations between the neuron layers and a number of
spatial filters, which can be substantially accelerated by GPU. Fig. 1. Simplified architecture for NVIDIA Tegra ×1 [42].

152
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

hundreds of watts, Tegra ×1 only consumes a few watts. For example,


when implementing the CNN model GoogleNet, Tegra ×1 achieves the
throughput of 33 images per second while only consuming 5.0 W [43].
Its energy efficiency is 5.4× better than the conventional desktop GPU
Titan X.

4.3.2. Field-programmable gate arrays


A field-programmable gate array (FPGA) is an integrated circuit
that can be configured to implement different digital logic functions.
Conventionally, FPGA is used as an emulation platform for early-stage
validation of application-specific integrated circuit (ASIC). It is now
extensively used for HPC for two reasons. First, FPGA is reconfigurable
and the same FPGA fabric can be programmed to implement different
logic functions. Compared to the conventional ASIC design, an FPGA
design reduces the non-recurring engineering (NRE) cost by reducing
the required design and validation time. Second, FPGA is programmed Fig. 3. Simplified architecture for the CNN accelerator implemented by Microsoft [46].
for a given application with its specific computing architecture. Hence,
it improves computing efficiency and reduces energy consumption over OpenCL channels or pipes can be used to transmit data between FPGA
CPU and/or GPU whose architectures are designed for general-propose and other external devices such as cameras.
computing. As an alternative example, Microsoft has developed a high-through-
Conventionally, an FPGA-based design is often described by hard- put CNN accelerator built upon Altera Stratix V and mapped the design
ware description language (HDL) such as Verilog or VHDL. The design to Arria 10 [46]. Fig. 3 shows the simplified architecture of the
is specified at register-transfer level (RTL) by registers and combina- accelerator. It can be configured by a software engine at run-time.
tional logics between these registers. It is a low-level abstraction and Inside the accelerator, the input buffers and weight buffers store the
designers must appropriately decide the detailed hardware architecture image pixels and filter kernels respectively. A large array of processing
and carefully handle the massive concurrency between different hard- elements (PEs) (e.g., thousands of PEs) efficiently computes the dot-
ware modules. Once the RTL description is available, it is further product values for convolution. A network-on-chip passes the outputs
synthesized by an EDA tool to generate the netlist mapped to FPGA. of PEs to input buffers. When running the accelerator, image pixels are
Such a conventional design methodology is time-consuming and read from DRAM to the input buffers. Next, the PEs compute the
requires FPGA designers to fully understand all low-level circuit convolution between image pixels and filter kernels for one convolu-
details. tional layer of CNN and then store the results back to input buffers.
Recently, with the advance of high-level synthesis (HLS), FPGA These results will be further circulated as the input data for the next
designers can now write high-level specifications in C, C++ or SystemC. conventional layer. In this way, the intermediate results of CNN are not
An HLS tool, such as Altera OpenCL or Xilinx HLS, is used to compile written back to DRAM and thus the amount of data communication
the high-level description into HDL. Furthermore, designers can between DRAM and FPGA is substantially reduced.
control the synthesis process by directly incorporating different “hints” Table 3 compares the energy efficiency of CPU, GPU and the
into the high-level description. The advance of HLS has made broad aforementioned two FPGA accelerators for the CNN of AlexNet. It is
and significant impacts to the community as it greatly reduces the straightforward to observe that the FPGA accelerators improve the
overall design cost and shortens the time-to-market. energy efficiency, measured by the throughput over power, compared
An appropriately optimized FPGA design has been demonstrated to to CPU and GPU.
be more energy-efficient than CPU or GPU for a variety of computer
vision algorithms such as optical flow, stereo vision, local image feature
extraction, etc [44]. In the literature, numerous research efforts have 4.3.3. Application-specific integrated circuits
been made by both academic and industrial researchers to implement Although FPGA offers a general reconfigurable solution where an
computer vision algorithms for the perception system required by application-specific design can be implemented to reduce the overhead
autonomous driving. Among them, CNN is one of the most promising posed by CPU and/or GPU, it often suffers from slow operation speed
solutions developed in recent years. and large chip area. These disadvantages are inherent in its reconfigur-
For instance, Altera has released its CNN accelerator for the FPGA ability where logic functions and interconnect wires are programed by
devices Stratix 10 and Arria 10 manufactured by 20 nm technology. using lookup tables and switches. Compared to FPGA, ASIC is able to
Both devices have built-in DSP units to efficiently perform floating- achieve superior performance by sacrificing low-level reconfigurability.
point operations. At its peak performance, Arria 10 can process ASIC implementation poses significant NRE cost for both design
hundreds of GFLOPs and Stratix 10 can process several TFLOPs. The and validation, especially for today's large-scale systems. The NRE cost
CNN accelerator is implemented with Altera OpenCL programming is high because a manufactured chip may fail to work and, hence,
language [45]. Fig. 2 shows the architecture of the aforementioned
accelerator. It is composed of several computing kernels, and each of Table 3
Performance comparison of CPU, GPU and FPGA accelerators for CNN.
them implements one CNN layer. Different kernels are connected by
OpenCL channels or pipes for data transmission without access to Throughput (frame/ Power (W) Efficiency (frame/
external memory, thereby reducing power consumption. The same second) second/W)

CPU E52699 Dual 1320 321 4.11


Xeon [45]
GPU Tesla K40 [46] 824 235 3.5
FPGA Arria 10 GX 1200 130 9.27
1150 [45]
FPGA Arria 10 GX 233 25 9.32
1150 + [46]
Fig. 2. Simplified architecture for the CNN accelerator implemented by Altera [45].

153
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

Fig. 4. Simplified architecture for the EyeQ SoC implemented by Mobileye [47].

several design iterations are often required. Once the AISC design is Fig. 5. Simplified architecture for the EyeQ. 2 SoC implemented by Mobileye [48].
validated in silicon, it can be manufactured with high volume and the
average cost per chip can be greatly reduced. For autonomous vehicles,
the market size is tremendous and, therefore, it justifies the high NRE
cost for ASIC design. In addition, the visual perception algorithms for
autonomous driving have been relatively mature, thereby eliminating
the risk for ASIC implementation to be outdated after its long design
circle.
For instance, Mobileye has launched the EyeQ SoC to implement
computationally intensive real-time algorithms for ADAS [47,48]. It
broadly covers a number of important visual perception algorithms,
including lane departure detection, vehicle detection, traffic sign
recognition, etc. As shown in Fig. 4, the EyeQ SoC contains two ARM
processors and four vision computing engines (i.e., a classifier engine, a
tracker engine, a lane detection engine, and a window, pre-processing
and filter engine).
In the aforementioned architecture, one of the ARM processors is
used to manage vision computing engines as well as the other ARM
processor. The other ARM processor is used for intensive computing Fig. 6. Simplified architecture for the TDA3x SoC implemented by Texas Instruments
tasks. The classifier engine is designed for image scaling, preproces- [49].
sing, and pattern classification. The tracker engine is used for image
warping and tracking. The lane detection engine identifies lane markers Fig. 6. It uses a heterogeneous architecture composed of DSP,
as well as road geometries. The window, preprocessing and filter engine embedded vision engine, ARM core, and image signal processor. The
is designed to convolute images, create image pyramids, detect edges, DSP unit can operate at 750 MHz and it contains two floating-point
and filter images. Furthermore, a direct memory access (DMA) multipliers and six arithmetic units. The embedded vision engine is a
component is used for both on-chip and off-chip data transmission vector processor operating at 650 MHz and it is optimized for
under the control of ARM processor. computer vision algorithms. The heterogeneous architecture of
Recently, Mobileye has implemented the EyeQ. 2 SoC, an upgraded TDA3x facilitates multiple ADAS functions in real time.
version of the EyeQ SoC, as shown in Fig. 5. It covers several additional More recently, a number of advanced system architectures have
applications including pedestrian protection, head lamp control, adap- been proposed to facilitate efficient implementation of deep learning
tive cruise control, headway monitoring and warning, etc. Different algorithms. Tensor processing unit (TPU) by Google [51] and dataflow
from the EyeQ SoC, ARM processors are replaced by MIPS processors. processing unit (DPU) by Wave Computing are two of these examples.
Furthermore, three vector microcode processors with single instruction
multiple data (SIMD) and very long instruction word (VLIW) are 5. Conclusions
added. In addition, the lane detection engine is removed while two
other vision computing engines are added for feature-based classifier In this paper we briefly summarize the recent progress on visual
and stereo vision. perception algorithms and the corresponding hardware implementa-
Similar to the EyeQ SoC, one of the MIPS processors is used to tions to facilitate autonomous driving. In particular, a variety of
control the vision computing engines, vector microcode modules, DMA algorithms are discussed for vehicle and pedestrian detection, lane
and the other MIPS processor. The other MIPS processor together with detection and drivable surface detection. On the other hand, CPU,
the vision computing engines performs computationally intensive GPU, FPGA and ASIC are presented as the major components to form
tasks. an efficient hardware platform for real-time computing and operation.
Besides Mobileye, Texas Instruments has developed the TDA3x SoC While significant technical advances have been accomplished in this
for ADAS [49]. It offers a variety of functions such as autonomous area, there remains a strong need to further improve both algorithm
emergency braking, lane keep assist, advanced cruise control, traffic and hardware designs in order to make autonomous vehicles safe,
sign recognition, pedestrian and object detection, forward collision reliable and comfortable. The technical challenges can be broadly
warning, etc. [50] The simplified architecture of TDA3x is shown in classified into three categories:

154
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

• Algorithm design: Accurate and robust algorithms are needed to K. Köser, M. Beermann, C. Häne, L. Heng, G.H. Lee, F. Fraundorfer, R. Iser,
R. Triebel, I. Posner, P. Newman, L. Wolf, M. Pollefeys, S. Brosig, J. Effertz,
handle all corner cases so that an autonomous vehicle can appro- C. Pradalier, R. Siegwart, Toward automated driving in cities using close-to-market
priately operate over these scenarios. Such a robustness feature is sensors: an overview of the V-charge project, IEEE IV (2013) 809–816.
particularly important in order to ensure safety. [22] J. Zolock, C. Senatore, R. Yee, R. Larson, B. Curry, The use of stationary object

• Hardware design: Adopting increasingly accurate and robust algo-


radar sensor data from advanced driver assistance systems (ADAS) in accident
reconstruction, SAE Technical Paper, no. 2016-01-1465, 2016.
rithms often increases computational complexity and, hence, needs [23] B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, P. Morton,
a powerful hardware platform to implement these algorithms. It, in A. Frenkel, On the segmentation of 3D LIDAR point clouds, IEEE ICRA (2011)
2798–2805.
turn, requires us to further improve both system architecture and [24] J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Kolter,
circuit implementation in order to boost the computing power for D. Langer, O. Pink, V. Pratt, M. Sokolsky, G. Stanek, D. Stavens, A. Teichman,
real-time operation. M. Werling, S. Thrun, Towards fully autonomous driving: systems and algorithms,

• System validation: Accurately and efficiently validating a complex


[25]
IEEE IV (2011) 163–168.
D. Forsyth, J. Ponce, Computer Vision: A Modern Approach, Pearson, 2002.
autonomous system is non-trivial. Any visual perception system [26] R. Mobus, U. Kolbe, Multi-target multi-object tracking, sensor fusion of radar and
based on machine learning cannot be 100% accurate. Hence, the infrared, IEEE IV (2004) 732–737.
system may fail for a specific input pattern and accurately estimating [27] NXP, Automotive radar millimeter-wave technology. Online Available: 〈http://
www.nxp.com/pages/automotive-radar-millimeter-wave-technology:AUTRMWT〉.
its rare failure rate can be extremely time-consuming [52]. [28] M. Bertozzi, L. Bombini, A. Broggi, M. Buzzoni, E. Cardarelli, S. Cattani, P. Cerri,
A. Coati, S. Debattisti, A. Falzoni, R. Fedriga, M. Felisa, L. Gatti, A. Giacomazzo,
To address the aforementioned challenges, academic researchers P. Grisleri, M. Laghi, L. Mazzei, P. Medici, M. Panciroli, P. Porta, P. Zani, P. Versari,
VIAC: an out of ordinary experiment, IEEE IV (2011) 175–180.
and industrial engineers from interdisciplinary fields such as artificial [29] K. Jo, J. Kim, D. Kim, C. Jang, M. Sunwoo, Development of autonomous car—part
intelligence, hardware system, automotive design, etc. must closely II: a case study on the implementation of an autonomous driving system based on
collaborate in order to achieve fundamental breakthroughs in the area. distributed architecture, IEEE Trans. Ind. Electron. 62 (8) (2015) 5119–5132.
[30] J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T. Strauss, C. Stiller, T. Dang,
U. Franke, N. Appenrodt, C. Keller, E. Kaus, R. Herrtwich, C. Rabe, D. Pfeiffer,
References F. Lindner, F. Stein, F. Erbs, M. Enzweiler, C. Knöppel, J. Hipp, M. Haueis,
M. Trepte, C. Brenk, A. Tamke, M. Ghanaat, M. Braun, A. Joos, H. Fritz, H. Mock,
M. Hein, E. Zeeb, Making bertha drive—an autonomous journey on a historic route,
[1] DARPA Urban Challenge, 2007. Online Available: 〈http://archive.darpa.mil/
IEEE Intell. Transp. Syst. Mag. 6 (2) (2014) 8–20.
grandchallenge/〉.
[31] M. Aeberhard, S. Rauch, M. Bahram, G. Tanzmeister, J. Thomas, Y. Pilat,
[2] National Science and Technology Council, Networking and information technology
F. Homm, W. Huber, N. Kaempchen, Experience, results and lessons learned from
research and development subcommittee, The National Artificial Intelligence
automated driving on germany's highways, IEEE Intell. Transp. Syst. Mag. (2015)
Research and Development Strategic Plan, Oct, 2016. Online Available: 〈https://
42–57.
www.nitrd.gov/news/national_ai_rd_strategic_plan.aspx〉.
[32] J. Nickolls, W. Dally, The GPU computing era, IEEE Micro 30 (2) (2010) 56–69.
[3] Executive Office of the President, National Science and Technology Council
[33] A. Neic, M. Liebmann, E. Hoetzl, L. Mitchell, E. Vigmond, G. Haase, G. Plank,
Committee on technology, Preparing for the Future of Artificial intelligence, Oct.
Accelerating cardiac bidomain simulations using graphics processing units, IEEE
2016. OnlineAvailable: 〈https://www.whitehouse.gov/blog/2016/10/12/
Trans. Biomed. Eng. 59 (8) (2012) 2281–2290.
administrations-report-future-artificial-intelligence/〉.
[34] J. Vetter, R. Glassbrook, J. Dongarra, K. Schwan, B. Loftis, S. McNally, J. Meredith,
[4] SAE J3016, Taxonomy and Definitions for Terms Related to Driving Automation
J. Rogers, P. Roth, K. Spafford, S. Yalamanchili, Keeneland: bringing heteroge-
Systems for On-road Motor Vehicles, 2016. Online Available: 〈http://standards.
neous GPU computing to the computational science community, Comput. Sci. Eng.
sae.org/j3016_201609/〉.
13 (2011) 90–95.
[5] J. Markoff, Toyota Invests $1 Billion in Artificial Intelligence in U.S., New York
[35] R. Weber, A. Gothandaraman, R. Hinde, G. Peterson, Comparing hardware
Times, Nov. 2015. Online Available: 〈http://www.nytimes.com/2015/11/06/
accelerators in scientific applications: a case study, IEEE Trans. Parallel Distrib.
technology/toyota-silicon-valley-artificial-intelligence-research-center.html〉.
Syst. 22 (1) (2011) 58–68.
[6] D. Primack, K. Korosec, GM Buying Self-driving Tech Startup for More Than $1
[36] M. Garland, S. Le Grand, J. Nickolls, J. Anderson, J. Hardwick, S. Morton,
Billion, Fortune, Mar. 2016. Online Available: 〈http://fortune.com/2016/03/11/
E. Phillips, Y. Zhang, V. Volkov, Parallel computing experiences with CUDA, IEEE
gm-buying-self-driving-tech-startup-for-more-than-1-billion/〉.
Micro 28 (4) (2008) 13–27.
[7] S. Behere, M. Törngren, A Functional Architecture for Autonomous Driving, ACM,
[37] J. Stone, D. Gohara, G. Shi, OpenCL: a parallel programming standard for
WASA, Montréal, QC, Canada, 2015.
heterogeneous computing systems, Comput. Sci. Eng. 12 (3) (2010) 66–73.
[8] P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple
[38] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object
features, IEEE CVPR 1 (2001) 511–518.
detection with region proposal networks, NIPS (2015) 91–99.
[9] D. Jeon, Q. Dong, Y. Kim, X. Wang, S. Chen, H. Yu, D. Blaauw, D. Sylvester, A 23-
[39] C. Chen, A. Seff, A. Kornhauser, J. Xiao, DeepDriving: learning affordance for direct
mW face recognition processor with mostly-read 5 T memory in 40-nm CMOS,
perception in autonomous driving, IEEE ICCV (2015).
IEEE J. Solid-State Circuits 52 (6) (2017) 1628–1642.
[40] Open Computer Vision Library (OpenCV). Online Available: 〈http://opencvlibrary.
[10] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, IEEE
sourceforge.net〉.
CVPR 1 (2005) 886–893.
[41] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama,
[11] S. Zhang, R. Benenson, B. Schiele, Filtered channel features for pedestrian
T. Darrell, Caffe: convolutional architecture for fast feature embedding, ACM MM
detection, IEEE CVPR (2015) 1751–1760.
(2014).
[12] L. Zhang, L. Lin, X. Liang, K. He, Is faster R-CNN doing well for pedestrian
[42] NVIDIA, Nvidia tegra x1. Online Available: 〈http://international.download.nvidia.
detection?, IEEE ECCV (2016) 443–457.
com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf〉.
[13] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object
[43] NVIDIA, GPU-Based Deep Learning Inference: A Performance and Power Analysis.
detection with region proposal networks, NIPS (2015).
Online Available: 〈https://www.nvidia.com/content/tegra/embedded-systems/
[14] J. Redmon, A. Farhadi, YOLO9000: Better, Faster, Stronger, Dec. 2016. Online
pdf/jetson_tx1_whitepaper.pdf〉.
Available: 〈https://arxiv.org/abs/1612.08242〉.
[44] K. Pauwels, M. Tomasi, J. Diaz Alonso, E. Ros, M. van Hulle, A comparison of
[15] Z. Kim, Robust lane detection and tracking in challenging scenarios, IEEE Trans.
FPGA and GPU for real-time phase-based optical flow, stereo, and local image
Intell. Transp. Syst. 9 (1) (2008) 16–26.
features, IEEE Trans. Comput. 61 (7) (2012) 999–1012.
[16] R. Gopalan, T. Hong, M. Shneier, R. Chellappa, A learning approach towards
[45] Intel, Efficient Implementation of Neural Network Systems Built on FPGAs and
detection and tracking of lane markings, IEEE Trans. Intell. Transp. Syst. 13 (3)
Programmed With OpenCL. Online Available: 〈https://www.altera.com/en_US/
(2012) 1088–1098.
pdfs/literature/solution-sheets/efficient_neural_networks.pdf〉.
[17] J. Alvarez, Y. LeCun, T. Gevers, A. Lopez, Semantic road segmentation via multi-
[46] K. Ovtcharov, O. Ruwase, J. Kim, J. Fowers, K. Strauss, E. Chung, Accelerating
scale ensembles of learned features, IEEE ECCV 2 (2012) 586–595.
deep convolutional neural networks using specialized hardware, Microsoft
[18] J. Yao, S. Ramalingam, Y. Taguchi, Y. Miki, R. Urtasun, Estimating drivable
Research, Feb. 2015. Online Available: 〈https://www.microsoft.com/en-us/
collision-free space from monocular video, IEEE WACV (2015) 420–427.
research/publication/accelerating-deep-convolutional-neural-networks-using-
[19] T. Drage, J. Kalinowski, T. Braunl, Integration of drive-by-wire with navigation
specialized-hardware/〉.
control for a driverless electric race car, IEEE Intell. Transp. Syst. Mag. 6 (4) (2014)
[47] Mobileye, EyeQ. Online Available: 〈http://www.mobileye.com/technology/
23–33.
processing-platforms/eyeq/〉.
[20] J. Wei, J. Snider, J. Kim, J. Dolan, R. Rajkumar, B. Litkouhi, Towards a viable
[48] Mobileye, EyeQ. 2. Online Available: 〈http://www.mobileye.com/technology/
autonomous driving research platform, IEEE IV (2013) 763–770.
processing-platforms/eyeq2/〉.
[21] P. Furgale, U. Schwesinger, M. Rufli, W. Derendarz, H. Grimmett, P. Mühlfellner,
[49] TI, New TDA3x SoC for ADAS Solutions in Entry- to Mid-level Automobiles. Online
S. Wonneberger, J. Timpner, S. Rottmann, B. Li, B. Schmidt, T.N. Nguyen,
Available: 〈http://www.ti.com/lit/ml/sprt708a/sprt708a.pdf〉.
E. Cardarelli, S. Cattani, S. Brüning, S. Horstmann, M. Stellmacher, H. Mielenz,

155
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156

[50] M. Mody, P. Swami, K. Chitnis, S. Jagannathan, K. Desappan, A. Jain, D. Poddar, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick,
Z. Nikolic, P. Viswanath, M. Mathew, S. Nagori, H. Garud, High performance front N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn,
camera ADAS applications on TI's TDA3X platform, High Perform. Comput. (2015) G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson,
456–463. B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D. Yoon,
[51] N. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, In-datacenter performance analysis of a tensor processing unit, Int. Sci. Congr.
S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, Assoc. (2017).
M. Daley, M. Dau, J. Dean, B. Gelb, T. Ghaemmaghami, R. Gottipati, W. Gulland, [52] W. Shi, M. Alawieh, X. Li, H. Yu, N. Arechiga, N. Tomatsu, Efficient statistical
R. Hagmann, C. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, validation of machine learning systems for autonomous driving, IEEE/ACM ICCAD
A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, (2016).
D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony,

156

You might also like