Algorithm and hardware implementation for visual perception system in autonomous vehicle
Algorithm and hardware implementation for visual perception system in autonomous vehicle
A R T I C L E I N F O A BS T RAC T
Keywords: This paper briefly surveys the recent progress on visual perception algorithms and their corresponding
Algorithm hardware implementations for the emerging application of autonomous driving. In particular, vehicle and
Hardware pedestrian detection, lane detection and drivable surface detection are presented as three important
Autonomous vehicle applications for visual perception. On the other hand, CPU, GPU, FPGA and ASIC are discussed as the major
Visual perception
components to form an efficient hardware platform for real-time operation. Finally, several technical challenges
are presented to motivate future research and development in the field.
⁎
Corresponding author at: Electrical and Computer Engineering Department, Duke University, Durham, NC 27708, USA.
E-mail addresses: [email protected] (W. Shi), [email protected] (M.B. Alawieh), [email protected] (X. Li), [email protected] (H. Yu).
http://dx.doi.org/10.1016/j.vlsi.2017.07.007
Received 20 May 2017; Received in revised form 24 July 2017; Accepted 26 July 2017
Available online 29 July 2017
0167-9260/ © 2017 Elsevier B.V. All rights reserved.
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
autonomy. Namely, an autonomous car can drive itself, instead of As the interface between the real world and the vehicle, an accurate
providing driver assistant only. At this level of autonomy, it requires perception system is extremely critical. If inaccurate perception
vehicles to sense the surrounding environment like humans, such as information is used to guide the decision and control system, an
distance of obstacle, signalization, location, moving pedestrians, etc., autonomous car may make incorrect decisions, resulting in poor
as well as making decisions like humans. These requirements lead to driving efficiency or worse, an accident. For example, if the traffic sign
the adoption and integration of a large set of new sensing devices, detection system misses a STOP sign, the vehicle may not make the
information processing algorithms, hardware computing units, and correct decision to stop, thereby leading to an accident.
then lead to new automotive E/E architecture design where safety, Among all perception functions, visual perception is one of the most
security, performance, power consumption, cost, etc., must be care- important components. It interprets visual data from multiple cameras
fully considered. and performs critical tasks such as vehicle and pedestrian detection.
In spite of all the technical and social challenges for adopting Although an autonomous driving system usually has other non-visual
autonomous vehicles, autonomy technologies are being developed with sensors, cameras are essential because they mimic human eyes and
significant investment and at a fast rate [5,6]. Among them, visual most traffic rules are designed by assuming the ability of visual
perception is one of the most critical technologies, as all important perception. For example, many traffic signs share similar physical
decisions made by autonomous vehicles rely on visual perception of the shapes and they are differentiated by their colored patterns that can
surrounding environment. Without correct perception, any decision only be captured by a visual perception system. In the next section, we
made to control a vehicle is not safe. In this paper, we present a brief review several important applications for visual perception and high-
survey on various perception algorithms and the underlying hardware light the corresponding algorithms.
platforms to execute these algorithms for real-time operation. In
particular, machine learning and computer vison algorithms are often 3. Visual perception algorithms
used to process the sensing data and derive an accurate understanding
of the surrounding environment, including vehicle and pedestrian Visual perception is mainly used for detecting obstacles that can be
detection, lane detection, drivable surface detection, etc. Based upon either dynamic (e.g., vehicle and pedestrian) or static (e.g., road curve
the perception outcome, an intelligent system can further make and lane marker). Different obstacles may have dramatically different
decisions to control and manipulate the vehicle. behaviors or represent different driving rules. For example, a road
To meet the competitive requirements on computing for real-time curve defines the strict boundary of the road and exceeding the
operation, special hardware platforms have been designed and im- boundary must be avoided. However, a lane marker defines the “soft”
plemented. Note that machine learning and computer vision algo- boundary of driving lane which vehicles may break if necessary.
rithms are often computationally expensive and, therefore, require a Therefore, it is not sufficient to detect obstacles only. A visual
powerful computing platform to process the data in a timely manner. perception algorithm must accurately recognize the obstacles of inter-
On the other hand, a commercially competitive system must be energy- est. In addition to obstacle detection, visual perception is also used for
efficient and with low cost. In this paper, a number of different possible drivable surface detection, where an autonomous vehicle needs to
choices for hardware implementation are briefly reviewed, including detect possible drivable space even when it is off-road (e.g. in a parking
CPU, GPU, FPGA, ASIC, etc. lot) or when the road is not clearly defined by road markers (e.g. on a
The reminder of this paper is organized as follows. Section 2 forest road). Over the past several decades, a large body of perception
overviews autonomous vehicles and several visual perception algo- algorithms have been developed. However, due to the page limit, we
rithms are summarized in Section 3. Important hardware platforms for will only review a small number of most representative algorithms in
implementing perception algorithms are discussed in Section 4. this paper.
Finally, we conclude in Section 5.
3.1. Vehicle and pedestrian detection
2. Autonomous vehicles
Detecting vehicles and pedestrians lies in the center of driving
As an intelligent system, autonomous car must automatically sense safety. Tremendous research efforts have been devoted to develop
the surrounding environment and make correct driving decisions by accurate, robust and fast detection algorithms. Most traditional detec-
itself. In general, the function components of an autonomous driving tion methods are composed of two steps. First, important features are
system can be classified into three categories: (i) perception, (ii) extracted from a raw image. A feature is an abstraction of image pixels
decision and control, and (iii) vehicle platform manipulation [7]. such as the gradient of pixels or the similarity between a local image
The perception system of an autonomous vehicle perceives the and the designed patterns. They can be considered as the low-level
environment and its interaction with the vehicle. Usually, it covers understanding of a raw image. A good feature can efficiently represent
sensing, sensor fusion, localization, etc. By integrating all these tasks, the valuable information required by detection while robustly tolerat-
we generate an understanding of the external world based on sensor ing the distortions such as rotation of image, variation of illumination
data. Given the perception information, a driving system must make condition, scaling of object, etc. Next, once the features are available, a
appropriate decisions to control the vehicle. The objective is to navigate learning algorithm is applied to further inspect the feature values and
a vehicle by following a planned route to the destination while avoiding recognize the scene represented by the image. By adopting an appro-
collisions with any static or dynamic obstacle. To achieve this goal, the priate algorithm for feature selection (e.g., AdaBoost [8,9]), a small
decision and control functions compute the global route based on a number of important features are often chosen from a large set of
map in its database, constantly plan the correct motion and generate candidates to build an efficient classifier.
local trajectories to avoid obstacles. Histogram of oriented gradients (HoG) [10] is one of the most
Once the driving decision is made, the components for vehicle widely adopted features for object detection. When calculating the HoG
platform manipulation execute the decision and ensure the vehicle to feature, an image is divided into a grid of cells and carefully normalized
act in an appropriate manner. They generate control signals for on local area. The histogram of the image gradients in a local area
propulsion, steering and braking. Since most traditional vehicles have forms a feature vector. The HoG feature is carefully hand-crafted and
already adopted the electrical controlling architecture, manipulation can achieve high accuracy in pedestrian and vehicle detection. It also
units usually do not require any major modification of the architecture. carries relatively low computational cost, which makes it popular in
Additionally, vehicle platform manipulation may cover emergency real-time applications such as autonomous driving. However, the
safety operations in case of system failure. design of hand-crafted features such as HoG requires extensive
149
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
domain-specific knowledge, thereby limiting the successful develop- over which a vehicle cannot drive. Therefore, obstacle detection alone
ment of new features. does not serve as a complete solution. For many autonomous vehicles,
Alternatively, we may generate new features base on existing hand- additional sensors such as LIDAR are deployed to accurately detect
crafted features such as HoG. For instance, the authors of [11] propose drivable surface. These approaches, however, are expensive and there
to add an extra middle layer for feature extraction after computing the is a strong interest in developing other alternative cost-effective
low-level features. The proposed middle layer combines different types approaches.
of low-level features by processing these features with a variety of filter Semantic segmentation is a promising technique to address the
patterns. Learning methods such as realboost are then applied to select aforementioned problem. It labels every pixel in an image with the
the best combinations of low-level features and these combinations object it belongs to. The object may be a car, a building or the road
become the new features. Although the computational cost increases itself. By using semantic segmentation, an autonomous vehicle can
for generating new features, these approaches can achieve higher directly locate the drivable space. The conventional algorithms for
detection accuracy than the conventional methods relying on low-level semantic segmentation adopt random field labeling and the dependen-
features only. cies among labels are modeled by combining features such as color and
More recently, the breakthrough on convolutional neural network texture. However, these conventional algorithms rely on hand-crafted
(CNN) poses a radically new approach where feature extraction is fully features that are not trivial to identify. In [17], a CNN is trained to
integrated into the learning process and all features are automatically extract local features automatically. These features are at multiple
learned from the training data [12]. A CNN is often composed of resolutions and thus are robust to scaling. It has been demonstrated
multiple layers. In a single convolutional layer, the input image is that the CNN approach outperforms other state-of-the-art methods in
processed by a set of filters and the output can be further passed to the the literature. However, adopting CNN results in expensive computa-
following convolutional layers. These filters at all convolutional layers tional cost. For this reason, the authors of [18] reduce the estimation of
are learnt from the training data and such a learning process can be drivable space to an inference problem on a 1-D graph and it uses
conceptually viewed as automatic feature extraction. CNN has been simple and light-weight techniques for real-time feature computation
demonstrated with the state-of-art accuracy for pedestrian detection and inference. Experimental results have been presented to demon-
[12]. However, it is computationally expensive where billions of strate its superior performance even for challenging datasets.
floating-point operations are often required for processing a single
image. To address this complexity issue, fast R-CNN [13] and YOLO 4. Hardware platforms
[14] have been proposed in the literature to reduce computational cost
and, consequently, achieve real-time operation. In the last decade, autonomous vehicles have attracted worldwide
attention. In addition to algorithm research, hardware development is
3.2. Lane detection extremely important for operating an autonomous vehicle in real time.
The Urban Challenge organized by DARPA in 2007 requires each team
Lane detection is an essential component for autonomous vehicles to demonstrate an autonomous vehicle navigating in a given environ-
to drive on both highway roads and urban streets. Failure to correctly ment where complex maneuvers such as merging, passing, parking and
detect a lane may break traffic rules and endanger the safety of not only negotiating intersections are tested [1]. After its great success, auton-
the autonomous vehicle itself but also other vehicles on the road. omous driving has been considered to be technically feasible and,
Today, lanes are mostly defined by the lane markings which can only be consequently, moved to the commercialization phase. For this reason,
detected by visual sensors. Therefore, designing real-time vision various autonomous driving systems are being developed by the
algorithm plays an irreplaceable role in reliable lane detection. industry. Academic researchers are also actively involved in this area
To facilitate safe and reliable driving, lane detection must be to develop novel ideas and methodologies to further improve perfor-
robustly implemented under non-ideal illumination and lane marking mance, enhance reliability and reduce cost.
conditions. In [15], a lane detection algorithm is developed and it is The hardware system of an autonomous vehicle is composed of
able to deal with challenging scenarios such as a curved lane, worn lane sensors (e.g., camera, LIDAR, radar, ultrasonic sensor, etc.), computing
markings, and lane changes including emerging and splitting. The devices and a drive-by-wire vehicle platform [19]. In this section, we
proposed approach adopts a probabilistic framework to combine object first briefly summarize the sensing and computing systems for auton-
recognition and tracking, achieving robust and real-time detection. omous driving demonstrated by several major industrial and academic
However, the approach in [15] relies on motion models of the players. Next, we will further describe the most recent progress in high-
vehicle and requires information from inertial sensors to track lane performance computing devices to facilitate autonomous driving.
markings. It may break down when the motion of a vehicle shows
random patterns. To address this issue, the authors of [16] propose a 4.1. Sensing systems
new approach that characterizes the tracking model by assuming static
lane markings, without relying on the knowledge about vehicle motion. Camera is one of the most critical components for visual perception.
As such, it has demonstrated superior performance for extremely Typically, the spatial resolution of a camera in autonomous vehicle
challenging scenarios during both daytime and nighttime. ranges from 0.3 megapixel to 2 megapixel [20,21]. A camera can
generate the video stream at 10–30 fps and captures important objects
3.3. Drivable surface detection such as traffic light, traffic sign, obstacles, etc., in real time.
In addition to camera, LIDAR is another important sensor. It
One of the fundamental problems in autonomous driving is to measures the distance between vehicle and obstacles by actively
identify the collision-free surface where a vehicle can safely drive. illuminating the obstacles with laser beams [22]. Typically, a LIDAR
Although obstacle detection plays an important role in constraining the system scans the surrounding environment periodically and generates
surface and defining the un-drivable space, it is not sufficient to fully multiple measurement points. This “cloud” of points can be further
determine the drivable space due to two reasons. First, it is extremely processed to compute a 3D map of the surrounding environment [23].
difficult, if not impossible, to detect all possible physical objects in real LIDAR is known to be relatively robust and accurate [24], but it is also
life. Various objects may act as obstacles and not all of them can be expensive.
precisely recognized by a detection algorithm. Second, a number of Alternatively, a stereo camera can be used to interpret the 3D
obstacles may not be described by a physical and well-characterized environment [25]. It is composed of two or more individual cameras.
form. For example, bridge edge and water surface are both obstacles Knowing the relative spatial locations of all individual cameras, a depth
150
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
–
widely used to detect obstacles. Their detection areas can be short-
of sensors due to the two important reasons. First, fusing the data
Six radars
–
–
scanners
LIDAR
core server runs vision and LIDAR algorithms. The other 6-core server
vision
vison
computing devices and each of them is equipped with one CPU and one
GPU [20]. All computing devices are interconnected by Ethernet
V-Charge [21]
BMW [31]
CMU [20]
A1 [29]
Vehicle
2011
2011
2013
2013
2014
2014
2015
Year
151
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
Table 2
Computing systems for autonomous vehicles.
2011 Junior [24] One Intel Xeon 12-core server and one Intel Xeon 6-core server
2013 CMU [20] One computing device equipped with an Intel Atom Processor D525 and four mini-ITX motherboards equipped with NVIDIA GT530 GPUs and Intel
Core 2 Extreme Processor QX9300s
2013 V-Charge [21] Six personal computers
2014 A1 [29] Two embedded industrial computers, a rapid controller prototyping electronic computing unit and 13 32-bit microcontroller-based electronic computing
units
2015 BMW [31] One standard personal computer and one real-time embedded prototyping computer
The A1 car designed by Hanyang University further distributes its Therefore, a large number of computer vision and deep learning tools
computing functions over more devices [29]. It adopts a distributed such as OpenCV [40] and Caffe [41] have taken advantage of GPU to
computing system consisting of two embedded industrial computers, a improve the throughput.
rapid controller prototyping electronic computing unit and 13 micro- For this reason, GPU has been considered as a promising comput-
controller-based electronic computing units. The two high-perfor- ing device for the application of autonomous driving. However, GPU
mance embedded industrial computers provide high computing power often consumes high energy. For instance, NVIDIA Tesla K40 is used in
to run sensor fusion, planning, and vision algorithms. The rapid [38] and its power consumption is around 235 W. Such a high power
controller prototyping electronic computing unit is particularly de- consumption poses two critical issues. First, it increases the load of
signed for real-time operation and, therefore, is used for time-critical power generation system inside a vehicle. Second, but more impor-
tasks such as vehicle control. The 13 electronic computing units are tantly, it makes heat dissipation extremely challenging because the
used for braking, steering, acceleration, etc. In order to achieve real- environmental temperature inside a vehicle is often significantly higher
time response, those computing devices are placed next to the actuators than the normal room temperature. To address these issues, various
in order to reduce the latency for communication. efforts have been made to design and implement mobile GPUs with
While the aforementioned hardware systems have been successfully reduced power consumption.
designed and adopted for real-time operation of autonomous driving, For instance, NVIDIA has released its newest mobile GPU Tegra ×1
their performance (measured by accuracy, throughput, latency, power, implemented with the TSMC 20 nm technology [42]. It is composed of
etc.) and cost (measured by price) remain noncompetitive for high- a 256-CUDA-Core GPU and two quad-core ARM CPUs, as shown in
volume production of commercial deployment. Hence, radically new Fig. 1. It also contains an end-to-end 4 K 60 fps pipeline which
hardware implementations must be developed to address both the supports high-performance video encoding, decoding and displaying.
technical challenges and the market needs in this field, as will be In addition, it offers a number of I/O interfaces such as USB3.0, HDMI,
further discussed in the next sub-section. serial peripheral interface, etc. The two ARM CPUs are implemented
with different options: (i) a high-performance ARM quad-core A53,
4.3. Computing devices and (ii) a power-efficient ARM quad-core A57. When running a set of
given applications, we can switch between the high-performance and
The aforementioned autonomous vehicles have successfully demon- low-power cores to achieve maximum power efficiency as well as
strated their self-driving capabilities by using conventional computing optimal performance. Tegra ×1 is one of the important chips for the
systems. However, their performance and cost are still noncompetitive DRIVE PX Auto-Pilot Platform marketed by NVIDA for autonomous
for commercial deployment and new computing devices must be driving [42].
developed to improve performance, enhance reliability and reduce At its peak performance, Tegra ×1 offers over 1 T FLOPs for 16-bit
cost. In this sub-section, we review the recent advances in the field operations and over 500 G FLOPs for 32-bit operations. It is designed
driven by major industrial and academic players. to improve power efficiency by optimizing its computing cores,
reorganizing its GPU architecture, improving memory compression,
4.3.1. Graphics processing units and adopting 20 nm technology. While a conventional GPU consumes
Graphics processing unit (GPU) is conventionally designed and
used for graphic processing tasks. Over the past decades, the advance
of GPU has been driven by the real-time performance requirements for
complex and high-resolution 3D scenes in computer games where
tremendous parallelism is inherent [32]. Today, general-purpose
graphics processing unit (GPGPU) is also widely used for high-
performance computing (HPC). It has been demonstrated with pro-
mising performance for scientific applications such as cardiac bido-
main simulation [33], biomolecular modeling [34], quantum Monte
Carlo [35], etc. A GPU contains hundreds or even thousands of parallel
processors and can achieve substantially higher throughput than CPU
when running massively parallel algorithms. To reduce the complexity
of GPU programming, multiple parallel programming tools such as
CUDA [36] and OpenCL [37] have been developed in the literature.
Many computer vision and machine learning algorithms used for
automotive perception are inherently parallel and, therefore, fit the
aforementioned GPU architecture. For example, convolutional neural
network (CNN) is a promising technique for autonomous perception
[38,39]. The computational cost of evaluating a CNN is dominated by
the convolution operations between the neuron layers and a number of
spatial filters, which can be substantially accelerated by GPU. Fig. 1. Simplified architecture for NVIDIA Tegra ×1 [42].
152
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
153
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
Fig. 4. Simplified architecture for the EyeQ SoC implemented by Mobileye [47].
several design iterations are often required. Once the AISC design is Fig. 5. Simplified architecture for the EyeQ. 2 SoC implemented by Mobileye [48].
validated in silicon, it can be manufactured with high volume and the
average cost per chip can be greatly reduced. For autonomous vehicles,
the market size is tremendous and, therefore, it justifies the high NRE
cost for ASIC design. In addition, the visual perception algorithms for
autonomous driving have been relatively mature, thereby eliminating
the risk for ASIC implementation to be outdated after its long design
circle.
For instance, Mobileye has launched the EyeQ SoC to implement
computationally intensive real-time algorithms for ADAS [47,48]. It
broadly covers a number of important visual perception algorithms,
including lane departure detection, vehicle detection, traffic sign
recognition, etc. As shown in Fig. 4, the EyeQ SoC contains two ARM
processors and four vision computing engines (i.e., a classifier engine, a
tracker engine, a lane detection engine, and a window, pre-processing
and filter engine).
In the aforementioned architecture, one of the ARM processors is
used to manage vision computing engines as well as the other ARM
processor. The other ARM processor is used for intensive computing Fig. 6. Simplified architecture for the TDA3x SoC implemented by Texas Instruments
tasks. The classifier engine is designed for image scaling, preproces- [49].
sing, and pattern classification. The tracker engine is used for image
warping and tracking. The lane detection engine identifies lane markers Fig. 6. It uses a heterogeneous architecture composed of DSP,
as well as road geometries. The window, preprocessing and filter engine embedded vision engine, ARM core, and image signal processor. The
is designed to convolute images, create image pyramids, detect edges, DSP unit can operate at 750 MHz and it contains two floating-point
and filter images. Furthermore, a direct memory access (DMA) multipliers and six arithmetic units. The embedded vision engine is a
component is used for both on-chip and off-chip data transmission vector processor operating at 650 MHz and it is optimized for
under the control of ARM processor. computer vision algorithms. The heterogeneous architecture of
Recently, Mobileye has implemented the EyeQ. 2 SoC, an upgraded TDA3x facilitates multiple ADAS functions in real time.
version of the EyeQ SoC, as shown in Fig. 5. It covers several additional More recently, a number of advanced system architectures have
applications including pedestrian protection, head lamp control, adap- been proposed to facilitate efficient implementation of deep learning
tive cruise control, headway monitoring and warning, etc. Different algorithms. Tensor processing unit (TPU) by Google [51] and dataflow
from the EyeQ SoC, ARM processors are replaced by MIPS processors. processing unit (DPU) by Wave Computing are two of these examples.
Furthermore, three vector microcode processors with single instruction
multiple data (SIMD) and very long instruction word (VLIW) are 5. Conclusions
added. In addition, the lane detection engine is removed while two
other vision computing engines are added for feature-based classifier In this paper we briefly summarize the recent progress on visual
and stereo vision. perception algorithms and the corresponding hardware implementa-
Similar to the EyeQ SoC, one of the MIPS processors is used to tions to facilitate autonomous driving. In particular, a variety of
control the vision computing engines, vector microcode modules, DMA algorithms are discussed for vehicle and pedestrian detection, lane
and the other MIPS processor. The other MIPS processor together with detection and drivable surface detection. On the other hand, CPU,
the vision computing engines performs computationally intensive GPU, FPGA and ASIC are presented as the major components to form
tasks. an efficient hardware platform for real-time computing and operation.
Besides Mobileye, Texas Instruments has developed the TDA3x SoC While significant technical advances have been accomplished in this
for ADAS [49]. It offers a variety of functions such as autonomous area, there remains a strong need to further improve both algorithm
emergency braking, lane keep assist, advanced cruise control, traffic and hardware designs in order to make autonomous vehicles safe,
sign recognition, pedestrian and object detection, forward collision reliable and comfortable. The technical challenges can be broadly
warning, etc. [50] The simplified architecture of TDA3x is shown in classified into three categories:
154
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
• Algorithm design: Accurate and robust algorithms are needed to K. Köser, M. Beermann, C. Häne, L. Heng, G.H. Lee, F. Fraundorfer, R. Iser,
R. Triebel, I. Posner, P. Newman, L. Wolf, M. Pollefeys, S. Brosig, J. Effertz,
handle all corner cases so that an autonomous vehicle can appro- C. Pradalier, R. Siegwart, Toward automated driving in cities using close-to-market
priately operate over these scenarios. Such a robustness feature is sensors: an overview of the V-charge project, IEEE IV (2013) 809–816.
particularly important in order to ensure safety. [22] J. Zolock, C. Senatore, R. Yee, R. Larson, B. Curry, The use of stationary object
155
W. Shi et al. INTEGRATION the VLSI journal 59 (2017) 148–156
[50] M. Mody, P. Swami, K. Chitnis, S. Jagannathan, K. Desappan, A. Jain, D. Poddar, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick,
Z. Nikolic, P. Viswanath, M. Mathew, S. Nagori, H. Garud, High performance front N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn,
camera ADAS applications on TI's TDA3X platform, High Perform. Comput. (2015) G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson,
456–463. B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, D. Yoon,
[51] N. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, In-datacenter performance analysis of a tensor processing unit, Int. Sci. Congr.
S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, Assoc. (2017).
M. Daley, M. Dau, J. Dean, B. Gelb, T. Ghaemmaghami, R. Gottipati, W. Gulland, [52] W. Shi, M. Alawieh, X. Li, H. Yu, N. Arechiga, N. Tomatsu, Efficient statistical
R. Hagmann, C. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, validation of machine learning systems for autonomous driving, IEEE/ACM ICCAD
A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, (2016).
D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony,
156