Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu Download
Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu Download
https://ebookbell.com/product/robotic-computing-on-fpgas-
synthesis-lectures-on-distributed-computing-theory-shaoshan-
liu-33377966
https://ebookbell.com/product/autonomous-robotic-systems-soft-
computing-and-hard-computing-methodologies-and-applications-1st-
edition-j-mira-4189056
https://ebookbell.com/product/human-communication-technology-
internetofroboticthings-and-ubiquitous-computing-1st-edition-
anandan-r-35849040
https://ebookbell.com/product/soft-computing-in-advanced-robotics-1st-
edition-yongtae-kim-4662850
Soft Computing For Intelligent Control And Mobile Robotics 1st Edition
Ramn Zatarain
https://ebookbell.com/product/soft-computing-for-intelligent-control-
and-mobile-robotics-1st-edition-ramn-zatarain-4194888
Geometric Computing With Clifford Algebras Theoretical Foundations And
Applications In Computer Vision And Robotics Softcover Reprint Of
Hardcover 1st Ed 2001 Editorgerald Sommer
https://ebookbell.com/product/geometric-computing-with-clifford-
algebras-theoretical-foundations-and-applications-in-computer-vision-
and-robotics-softcover-reprint-of-hardcover-1st-ed-2001-editorgerald-
sommer-54790386
https://ebookbell.com/product/wavelets-in-soft-computing-world-
scientific-series-in-robotics-and-intelligent-systems-25-marc-
thuillard-2170742
https://ebookbell.com/product/aspects-of-soft-computing-intelligent-
robotics-and-control-1st-edition-endre-pap-auth-4193848
https://ebookbell.com/product/computational-surgery-and-dual-training-
computing-robotics-and-imaging-b-l-bass-4593808
https://ebookbell.com/product/advances-in-soft-computing-intelligent-
robotics-and-control-1st-edition-jnos-fodor-4662878
Series ISSN: 1935-3235
LIU • ET AL
Synthesis Lectures on
Computer Architecture
Robotic Computing on FPGAs
Shaoshan Liu, PerceptIn
Zishen Wan, Georgia Institute of Technology
Bo Yu, PerceptIn
Yu Wang, Tsinghua University
About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
books provide concise, original presentations of important research and
development topics, published quickly, in digital and print formats.
Synthesis Lectures on
Computer Architecture
store.morganclaypool.com
Natalie Enright Jerger, Series Editor
Robotic Computing
on FPGAs
Synthesis Lectures on
Computer Architecture
Editor
Natalie Enright Jerger, University of Toronto
Editor Emerita
Margaret Martonosi, Princeton University
Founding Editor Emeritus
Mark D. Hill, University of Wisconsin, Madison
Synthesis Lectures on Computer Architecture publishes 50- to 100-page books on topics pertaining to
the science and art of designing, analyzing, selecting, and interconnecting hardware components to
create computers that meet functional, performance, and cost goals. The scope will largely follow
the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and
ASPLOS.
Robotic Computing on FPGAs
Shaoshan Liu, Zishen Wan, Bo Yu, and Yu Wang
2021
AI for Computer Architecture: Principles, Practice, and Prospects
Lizhong Chen, Drew Penney, and Daniel Jiménez
2020
Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale
Production
Andres Rodriguez
2020
Parallel Processing, 1980 to 2020
Robert Kuhn and David Padua
2020
Data Orchestration in Deep Learning Accelerators
Tushar Krishna, Hyoukjun Kwon, Angshuman Parashar, Michael Pellauer, and Ananda Samajdar
2020
Analyzing Analytics
Rajesh Bordawekar, Bob Blainey, and Ruchir Puri
2015
Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao
2015
Die-stacking Architecture
Yuan Xie and Jishen Zhao
2015
Shared-Memory Synchronization
Michael L. Scott
2013
Multithreading Architecture
Mario Nemirovsky and Dean M. Tullsen
2013
Performance Analysis and Tuning for General Purpose Graphics Processing Units
(GPGPU)
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu
2012
On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009
The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
Bruce Jacob
2009
Transactional Memory
James R. Larus and Ravi Rajwar
2006
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
DOI 10.2200/S01101ED1V01Y202105CAC056
Lecture #56
Series Editor: Natalie Enright Jerger, University of Toronto
Editor Emerita: Margaret Martonosi, Princeton University
Founding Editor Emeritus: Mark D. Hill, University of Wisconsin, Madison
Series ISSN
Print 1935-3235 Electronic 1935-3243
Robotic Computing
on FPGAs
Shaoshan Liu
PerceptIn
Zishen Wan
Georgia Institute of Technology
Bo Yu
PerceptIn
Yu Wang
Tsinghua University
M
&C Morgan & cLaypool publishers
ABSTRACT
This book provides a thorough overview of the state-of-the-art field-programmable gate array
(FPGA)-based robotic computing accelerator designs and summarizes their adopted optimized
techniques. This book consists of ten chapters, delving into the details of how FPGAs have been
utilized in robotic perception, localization, planning, and multi-robot collaboration tasks. In
addition to individual robotic tasks, this book provides detailed descriptions of how FPGAs have
been used in robotic products, including commercial autonomous vehicles and space exploration
robots.
KEYWORDS
robotics, FPGAs, autonomous machines, perception, localization, planning, con-
trol, space exploration, deep learning
xi
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
2 FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 An Introduction to FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Types of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Commercial Applications of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 What is Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 How to Use Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Achieving High Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 Real-World Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Robot Operating System (ROS) on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Robot Operating System (ROS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 ROS-Compliant FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Optimizing Communication Latency for the ROS-Compliant
FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5 Localization on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
xiii
5.3 Frontend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.2 Exploiting Task-Level Parallelisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Backend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.2 Resource Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 Planning on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1 Motion Planning Context Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1.1 Probabilistic Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.1.2 Rapidly Exploring Random Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Collision Detection on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2.1 Motion Planning Compute Time Profiling . . . . . . . . . . . . . . . . . . . . . 94
6.2.2 General Purpose Processor-Based Solutions . . . . . . . . . . . . . . . . . . . . 95
6.2.3 Specialized Hardware Accelerator-Based Solutions . . . . . . . . . . . . . . 97
6.2.4 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3 Graph Search on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.1 What we Have Covered in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.2 Looking Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Preface
In this book, we provide a thorough overview of the state-of-the-art FPGA-based robotic com-
puting accelerator designs and summarize their adopted optimized techniques. The authors
combined have over 40 years of research experiences of utilizing FPGAs in robotic applications,
both in academic research and commercial deployments. For instance, the authors have demon-
strated that, by co-designing both the software and hardware, FPGAs can achieve more than 10×
better performance and energy efficiency compared to the CPU and GPU implementations. The
authors have also pioneered the utilization of the partial reconfiguration methodology in FPGA
implementations to further improve the design flexibility and reduce the overhead. In addition,
the authors have successfully developed and shipped commercial robotic products powered by
FPGAs and the authors demonstrate that FPGAs have excellent potential and are promising
candidates for robotic computing acceleration due to its high reliability, adaptability, and power
efficiency.
The authors believe that FPGAs are the best compute substrate for robotic applications
for several reasons. First, robotic algorithms are still evolving rapidly, and thus any ASIC-based
accelerators will be months or even years behind the state-of-the-art algorithms. On the other
hand, FPGAs can be dynamically updated as needed. Second, robotic workloads are highly di-
verse, thus it is difficult for any ASIC-based robotic computing accelerator to reach economies
of scale in the near future. On the other hand, FPGAs are a cost effective and energy-effective
alternative before one type of accelerator reaches economies of scale. Third, compared to sys-
tems on a chip (SoCs) that have reached economies of scale, e.g., mobile SoCs, FPGAs deliver a
significant performance advantage. Fourth, partial reconfiguration allows multiple robotic work-
loads to time-share an FPGA, thus allowing one chip to serve multiple applications, leading to
overall cost and energy reduction.
Specifically, FPGAs require little power and are often built into small systems with less
memory. They have the ability of massively parallel computations and to make use of the prop-
erties of perception (e.g., stereo matching), localization (e.g., simultaneous localization and
mapping (SLAM)), and planning (e.g., graph search) kernels to remove additional logic so
as to simplify the end-to-end system implementation. Taking into account hardware charac-
teristics, several algorithms are proposed which can be run in a hardware-friendly way and
achieve similar software performance. Therefore, FPGAs are possible to meet real-time require-
ments while achieving high energy efficiency compared to central processing units (CPUs) and
graphics processing units (GPUs). In addition, unlike the application-specific integrated circuit
(ASIC) counterparts, FPGA technologies provide the flexibility of on-site programming and
re-programming without going through re-fabrication with a modified design. Partial Recon-
xvi PREFACE
figuration (PR) takes this flexibility one step further, allowing the modification of an operating
FPGA design by loading a partial configuration file. Using PR, part of the FPGA can be recon-
figured at runtime without compromising the integrity of the applications running on those parts
of the device that are not being reconfigured. As a result, PR can allow different robotic applica-
tions to time-share part of an FPGA, leading to energy and performance efficiency, and making
FPGA a suitable computing platform for dynamic and complex robotic workloads. Due to the
advantages over other compute substrates, FPGAs have been successfully utilized in commercial
autonomous vehicles as well as in space robotic applications, for FPGAs offer unprecedented
flexibility and significantly reduced the design cycle and development cost.
This book consists of ten chapters, providing a thorough overview of how FPGAs have
been utilized in robotic perception, localization, planning, and multi-robot collaboration tasks.
In addition to individual robotic tasks, we provide detailed descriptions of how FPGAs have
been used in robotic products, including commercial autonomous vehicles and space exploration
robots.
Shaoshan Liu
June 2021
1
CHAPTER 1
1.1 SENSING
The sensing stage is responsible for extracting meaningful information from the sensor raw data.
To enable intelligent actions and improve reliability, the robot platform usually supports a wide
range of sensors. The number and type of sensors are heavily dependent on the specifications of
the workload and the capability of the onboard compute platform. The sensors can include the
following:
Cameras. Cameras are usually used for object recognition and object tracking, such as
lane detection in autonomous vehicles and obstacle detection in drones, etc. RGB-D camera
can also be utilized to determine object distances and positions. Take autonomous vehicles as
an example, the current system usually mounts eight or more 1080p cameras around the vehicle
to detect, recognize and track objects in different directions, which can greatly improve safety.
1.1. SENSING 3
Usually, these cameras run at 60 Hz, which will process about multiple gigabytes of raw data
per second when combined.
GNSS/IMU. The global navigation satellite system (GNSS) and inertial measurement
unit (IMU) system help the robot localize itself by reporting both inertial updates and an es-
timate of the global location at a high rate. Different robots have different requirements for
localization sensing. For instance, 10 Hz may be enough for a low-speed mobile robot, but
high-speed autonomous vehicles usually demand 30 Hz or higher for localization, and high-
speed drones may need 100 Hz or more for localization, thus we are facing a wide spectrum of
sensing speeds. Fortunately, different sensors have their advantages and drawbacks. GNSS can
enable fairly accurate localization, while it runs at only 10 Hz, thus unable to provide real-time
updates. By contrast, both accelerometer and gyroscope in IMU can run at 100–200 Hz, which
can satisfy the real-time requirement. However, IMU suffers bias wandering over time or per-
turbation by some thermo-mechanical noise, which may lead to an accuracy degradation in the
position estimates. By combining GNSS and IMU, we can get accurate and real-time updates
for robots.
LiDAR. Light detection and ranging (LiDAR) is used for evaluating distance by illu-
minating the obstacles with laser light and measuring the reflection time. These pulses, along
with other recorded data, can generate precise and three-dimensional information about the
surrounding characteristics. LiDAR plays an important role in localization, obstacle detection,
and avoidance. As indicated in [20], the choice of sensors dictates the algorithm and hardware
design. Take autonomous driving as an instance, almost all the autonomous vehicle companies
use LiDAR at the core of their technologies. Examples include Uber, Waymo, and Baidu. Per-
ceptIn and Tesla are among the very few that do not use LiDAR and, instead, rely on cameras
and vision-based systems. In particular, PerceptIn’s data demonstrated that for the low-speed
autonomous driving scenario, LiDAR processing is slower than camera-based vision processing,
but increases the power consumption and cost.
Radar and Sonar. The Radio Detection and Ranging (Radar) and Sound Navigation and
Ranging (Sonar) system is used to determine the distance and speed to a certain object, which
usually serves as the last line of defense to avoid obstacles. Take autonomous vehicles as an ex-
ample, a danger of collision may occur when near obstacles are detected, then the vehicle will
apply brakes or turn to avoid obstacles. Compared to LiDAR, the Radar and Sonar system is
cheaper and smaller, and their raw data is usually fed to the control processor directly with-
out going through the main compute pipeline, which can be used to implement some urgent
functions as swerving or applying the brakes.
One key problem we have observed with commercial CPUs, GPUs, or mobile SoCs is
the lack of built-in multi-sensor processing supports, hence most of the multi-sensor processing
has to be done in software, which could lead to problems such as time synchronization. On
the other hand, FPGAs provide a rich sensor interface and enable most time-critical sensor
4 1. INTRODUCTION AND OVERVIEW
Operating System
Hardware Platform
1.2 PERCEPTION
The sensor data is then fed into the perception layer to sense the static and dynamic objects as
well as build a reliable and detailed representation of the robot’s environment by using computer
vision techniques (including deep learning).
The perception layer is responsible for object detection, segmentation, and tracking. There
are obstacles, lane dividers, and other objects to detect. Traditionally, a detection pipeline starts
with image pre-processing, followed by a region of interest detector, and finally a classifier that
outputs detected objects. In 2005, Dalal and Triggs [22] proposed an algorithm based on the
histogram of orientation (HOG) and support vector machine (SVM) to model both the ap-
pearance and shape of the object under various condition. The goal of segmentation is to give
the robot a structured understanding of its environment. Semantic segmentation is usually for-
mulated as a graph labeling problem with vertices of the graph being pixels or super-pixels.
Inference algorithms on graphical models such as conditional random field (CRF) [23, 24] are
used. The goal of tracking is to estimate the trajectory of moving obstacles. Tracking can be
formulated as a sequential Bayesian filtering problem by recursively running the prediction step
and correction step. Tracking can also be formulated by tracking-by-detection handling with
1.3. LOCALIZATION 5
Markovian decision process (MDP) [25], where an object detector is applied to consecutive
frames and detected objects are linked across frames.
In recent years, deep neural networks (DNNs), also known as deep learning, have greatly
affected the field of computer vision and made significant progress in solving robot percep-
tion problems. Most state-of-the-art algorithms now apply one type of neural network based
on convolution operation. Fast R-CNN [26], Faster R-CNN [27], SSD [28], YOLO [29],
and YOLO9000 [30] were used to get much better speed and accuracy in object detection.
Most CNN-based semantic segmentation work is based on Fully Convolutional Networks
(FCNs) [31], and there are some recent work in spatial pyramid pooling network [32] and pyra-
mid scene parsing network (PSPNet) [33] to combine global image-level information with the
locally extracted feature. By using auxiliary natural images, a stacked autoencoder model can be
trained offline to learn generic image features and then applied for online object tracking [34].
In Chapter 3, we review the state-of-the-art neural network accelerator designs and
demonstrate that with software-hardware co-design, FPGAs can achieve more than 10 times
better speed and energy efficiency than the state-of-the-art GPUs. This verifies that FPGAs are
a promising candidate for neural network acceleration. In Chapter 4, we review various stereo vi-
sion algorithms in the robotic perception and their FPGA accelerator designs. We demonstrate
that with careful algorithm-hardware co-design, FPGAs can achieve two orders of magnitude
of higher energy efficiency and performance than the state-of-the-art GPUs and CPUs.
1.3 LOCALIZATION
The localization layer is responsible for aggregating data from various sensors to locate the robot
in the environment model.
GNSS/IMU system is used for localization. The GNSS consist of several satellite systems,
such as GPS, Galileo, and BeiDou, which can provide accurate localization results but with a
slow update rate. In comparison, IMU can provide a fast update with less accurate rotation and
acceleration results. A mathematical filter, such as Kalman Filter, can be used to combine the
advantages of the two and minimize the localization error and latency. However, this sole system
has some problems, such as the signal may bounce off obstacles, introduce more noise, and fail
to work in closed environments.
LiDAR and High-Definition (HD) maps are used for localization. LiDAR can generate
point clouds and provide a shape description of the environment, while it is hard to differentiate
individual points. HD map has a higher resolution compared to digital maps and makes the route
familiar to the robot, where the key is to fuse different sensor information to minimize the errors
in each grid cell. Once the HD map is built, a particle filter method can be applied to localize
the robot in real-time correlated with LiDAR measurement. However, the LiDAR performance
may be severely affected by weather conditions (e.g., rain, snow) and bring localization error.
Cameras are used for localization as well. The pipeline of vision-based localization is sim-
plified as follows: (1) by triangulating stereo image pairs, a disparity map is obtained and used
6 1. INTRODUCTION AND OVERVIEW
to derive depth information for each point; (2) by matching salient features between successive
stereo image frames in order to establish correlations between feature points in different frames,
the motion between the past two frames is estimated; and (3) by comparing the salient features
against those in the known map, the current position of the robot is derived [35].
Apart from these techniques, sensor fusion strategy is also often utilized to combine mul-
tiple sensors for localization, which can improve the reliability and robustness of robot [36, 37].
In Chapter 5, we introduce a general-purpose localization framework that integrates key
primitives in existing algorithms along with its implementation in FPGA. The FPGA-based
localization framework retains high accuracy of individual algorithms, simplifies the software
stack, and provides a desirable acceleration target.
Camera
30 Hz 2D 30 Hz 10 Hz 10 Hz 10 Hz 10 Hz
Perception
Perception
LiDAR Tracking Prediction Planning Control
Fusion
10 Hz 3D
Perception 10 Hz 100 Hz
10 Hz
100 Hz 10 Hz
Localization
3D Perception module as well as the Localization module, the GNSS/IMUs generate positional
updates at 100 Hz and feed the raw data to the Localization module, the mmWave radars detect
obstacles at 10 FPS and feed the raw data to the Perception Fusion module.
Next, the results of 2D and 3D Perception Modules are fed into the Perception Fusion
module at 30 Hz and 10 Hz, respectively, to create a comprehensive perception list of all detected
objects. The perception list is then fed into the Tracking module at 10 Hz to create a tracking list
of all detected objects. The tracking list is then fed into the Prediction module at 10 Hz to create
a prediction list of all objects. After that, both the prediction results and the localization results
are fed into the Planning module at 10 Hz to generate a navigation plan. The navigation plan is
then fed into the Control module at 10 Hz to generate control commands, which are finally sent
to the autonomous vehicle for execution at 100 Hz.
Hence, for each 10 ms, the autonomous vehicle needs to generate a control command to
maneuver the vehicle. If any upstream module, such as the Perception module, misses the deadline
to generate an output, the Control module still has to generate a command before the deadline.
This could lead to disastrous results as the autonomous vehicle is essentially driving blindly
without the perception output.
The key challenge is to design a system to minimize the end-to-end latency of the deep
processing pipeline within energy and cost constraints, and with minimum latency variation.
In this book, we demonstrate that FPGAs can be utilized in different modules in this long
processing pipeline to minimize latency, reduce latency variation, and achieve energy efficiency.
1.7. SUMMARY 9
1.7 SUMMARY
The authors believe that FPGAs are the indispensable compute substrate for robotic applications
for several reasons.
• First, robotic algorithms are still evolving rapidly. Thus, any ASIC-based accelerators
will be months or even years behind the state-of-the-art algorithms; on the other hand,
FPGAs can be dynamically updated as needed.
• Second, robotic workloads are highly diverse. Thus, it is difficult for any ASIC-based
robotic computing accelerator to reach economies of scale in the near future; on the
other hand, FPGAs are a cost-effective and energy-effective alternative before one type
of accelerator reaches economies of scale.
• Third, compared to SoCs that have reached economies of scale, e.g., mobile SoCs and
FPGAs deliver a significant performance advantage.
• Fourth, partial reconfiguration allows multiple robotic workloads to time-share an
FPGA, thus allowing one chip to serve multiple applications, leading to overall cost
and energy reduction.
Specifically, FPGAs require little power and are often built into small systems with less
memory. They have the ability to parallel computations massively and make use of the proper-
ties of perception (e.g., stereo matching), localization (e.g., SLAM), and planning (e.g., graph
search) kernels to remove additional logic and simplify the implementation. Taking into ac-
count hardware characteristics, several algorithms are proposed which can be run in a hardware-
friendly way and achieve similar software performance. Therefore, FPGAs are possible to meet
real-time requirements while achieving high energy efficiency compared to CPUs and GPUs.
Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re-
programming without going through re-fabrication with a modified design. PR takes this flex-
ibility one step further, allowing the modification of an operating FPGA design by loading a
partial configuration file. Using PR, part of the FPGA can be reconfigured at runtime without
compromising the integrity of the applications running on those parts of the device that are
not being reconfigured. As a result, PR can allow different robotic applications to time-share
part of an FPGA, leading to energy and performance efficiency, and making FPGA a suitable
computing platform for dynamic and complex robotic workloads.
Due to the advantages over other compute substrates, FPGAs have been successfully uti-
lized in commercial autonomous vehicles. Particularly, over the past four years, PerceptIn has
built and commercialized autonomous vehicles for micromobility, and PerceptIn’s products have
been deployed in China, the U.S., Japan, and Switzerland. In this book, we provide a real-world
case study on how PerceptIn developed its computing system by relying heavily on FPGAs,
which perform not only heterogeneous sensor synchronizations but also the acceleration of soft-
ware components on the critical path. In addition, FPGAs are used heavily in space robotic
10 1. INTRODUCTION AND OVERVIEW
applications, for FPGAs offered unprecedented flexibility and significantly reduced the design
cycle and development cost.
11
CHAPTER 2
FPGA Technologies
Before we delve into utilizing FPGAs for accelerating robotic workloads, in this chapter we
first provide the background of FPGA technologies so that readers without prior knowledge
can grasp the basic understanding of what an FPGA is and how an FPGA works. We also
introduce partial reconfiguration, a technique that exploits the flexibility of FPGAs and one
that is extremely useful for various robotic workloads to time-share an FPGA so as to minimize
energy consumption and resource utilization. In addition, we explore existing techniques that
enable the robot operating system (ROS), an essential infrastructure for robotic computing, to
run directly on FPGAs.
• Antifuse FPGAs are non-volatile and have a minimal delay due to routing, resulting
in a faster speed and lower power consumption. The drawback is evident as they have a
relatively more complicated fabrication process and are only one-time programmable.
• SRAM-based FPGAs are field reprogrammable and use the standard fabrication pro-
cess that foundries put in significant effort in optimizing, resulting in a faster rate of
performance increase. However, based on SRAM, these FPGAs are volatile and may
not hold configuration if a power glitch occurs. Also, they have more substantial rout-
ing delays, require more power, and have a higher susceptibility to bit errors. Note that
SRAM-based FPGAs are the most popular compute substrates in space applications.
• Flash-based FPGAs are non-volatile and reprogrammable, and also have low power
consumption and route delay. The major drawback is that runtime reconfiguration is
not recommended for flash-based FPGAs due to the potentially destructive results if
radiation effects occur during the reconfiguration process [52]. Also, the stability of
stored charge on the floating gate is of concern: it is a function including factors such
as operating temperature, the electric fields that might disturb the charge. As a result,
flash-based FPGAs are not as frequently used in space missions [53].
• Configurable Logic Blocks (CLBs) are the basic repeating logic resources on an
FPGA. When linked together by the programmable routing blocks, CLBs can exe-
cute complex logic functions, implement memory functions, and synchronize code on
the FPGA. CLBs contain smaller components, including flip-flops (FFs), look-up ta-
bles (LUTs), and multiplexers (MUX). An FF is the smallest storage resource on the
FPGA. Each FF in a CLB is a binary register used to save logic states between clock
cycles on an FPGA circuit. An LUT stores a predefined list of outputs for every com-
bination of inputs. LUTs provide a fast way to retrieve the output of a logic operation
because possible results are stored and then referenced rather than calculated. A MUX
is a circuit that selects between two or more inputs and then returns the selected input.
Any logic can be implemented using the combination of FFs, LUTs, and MUX.
2.1. AN INTRODUCTION TO FPGA TECHNOLOGIES 13
Configurable logic block
Interconnect wires
Set by
Logic block SRAM configuration
bit-strem
1
Output
Inputs 4-LUT FF 0
I/O block
Reg Input
DDR MUX
OCK1
Reg
DSP ICK1
Reg
Reg
OCK2 3-state
ICK2
Reg
DDR MUX
OCK1
PAD
Reg
OCK2 Output
• I/O Blocks (IOBs) are used to bridge signals onto the chip and send them back off
again. An IOB consists of an input buffer and an output buffer with three-state and
open-collector output controls. Typically, there are pull-up resistors on the outputs and
sometimes pull-down resistors that can be used to terminate signals and buses without
requiring discrete resistors external to the chip. The polarity of the output can usually
be programmed for active high or active low output. There are typical flip-flops on
outputs so that clocked signals can be output directly to the pins without encountering
significant delay, more easily meeting the setup time requirement for external devices.
14 2. FPGA TECHNOLOGIES
Since there are many IOBs available on an FPGA and these IOBs are programmable,
we can easily design a compute system to connect to different types of sensors, which
are extremely useful in robotic workloads.
• Digital Signal Processors (DSPs) have been optimized to implement various com-
mon digital signal processing functions with maximum performance and minimum
logic resource utilization. In addition to multipliers, each DSP block has functions
that are frequently required in typical DSP algorithms. These functions usually in-
clude pre-adders, adders, subtractors, accumulators, coefficient register storage, and a
summation unit. With these rich features, the DSP blocks in the Stratix series FP-
GAs are ideal for applications with high-performance and computationally intensive
signal processing functions, such as finite impulse response (FIR) filtering, fast Fourier
transforms (FFTs), digital up/down conversion, high-definition (HD) video process-
ing, HD CODECs, etc. Besides the aforementioned traditional workloads, DSPs are
also extremely useful for robotic workloads, especially computer vision workloads, pro-
viding high-performance and low-power solutions for robotic vision front ends [55].
• Medical – For diagnostic, monitoring, and therapy applications, FPGAs have been
used to meet a range of processing, display, and I/O interface requirements.
• Security – FPGAs offer solutions that meet the evolving needs of security applications,
from access control to surveillance and safety systems.
• Video & Image Processing – FPGAs have been utilized in targeted design platforms
to enable higher degrees of flexibility, faster time-to-market, and lower overall non-
recurring engineering costs (NRE) for a wide range of video and imaging applications.
• Wireless Communications – FPGAs have been utilized to develop RF, base band,
connectivity, transport, and networking solutions for wireless equipment, addressing
standards such as WCDMA, HSDPA, WiMAX, and others.
In the rest of this book, we explore robotic computing, an emerging and potentially a killer
application for FPGAs. With FPGAs, we can develop low-power, high-performance, cost-
effective, and flexible compute systems for various robotic workloads. Due to the advantages
provided by FPGAs, we expect that robotic applications will be a major demand driver for
FPGAs in the near future.
FPGA
integrity of the applications running on those parts of the device that are not being reconfigured.
RPR allows a limited, predefined portion of an FPGA to be reconfigured while the rest of
the device continues to operate, and this feature is especially valuable where devices operate
in a mission-critical environment that cannot be disrupted while some subsystems are being
redefined.
In an SRAM-based FPGA, all user-programmable features are controlled by memory
cells that are volatile and must be configured on power-up. These memory cells are known as
the configuration memory, and they define the look-up table (LUT) equations, signal routing,
input/output block (IOB) voltage standards, and all other aspects of the design. In order to
program the configuration memory, instructions for the configuration control logic and data for
the configuration memory are provided in the form of a bitstream, which is delivered to the
device through the JTAG, SelectMAP, serial, or ICAP configuration interface. An FPGA can
be partially reconfigured using a partial bitstream. A designer can use such a partial bitstream
to change the structure of one part of an FPGA design as the rest of the device continues to
operate.
RPR is useful for systems with multiple functions that can time-share the same FPGA
device resources. In such systems, one section of the FPGA continues to operate, while other
sections of the FPGA are disabled and reconfigured to provide new functionality. This is anal-
ogous to the situation where a microprocessor manages context switching between software
processes. In the case of PR of an FPGA, however, it is the hardware instead of the software
that is being switched.
RPR provides an advantage over multiple full bitstreams in applications that require con-
tinuous operation, which would not be possible during full reconfiguration. One example is a
mobile robot that switches the perception module while keeping the localization module and
planning module intact when moving from a dark environment to a bright environment. With
RPR, the system can maintain the localization and planning modules while the perception mod-
ule within the FPGA is changed on the fly.
18 2. FPGA TECHNOLOGIES
Source file
Behavioral
simulation
Synthesis
Functional
verification
Static
Layout
analysis
Static
analysis
Partial bit file Full bit file
generation generation
Xilinx has provided the PR feature in their high-end FPGAs, the Virtex series, in limited
access BETA since the late 1990s. More recently it is a production feature supported by their
tools and across their devices since the release of ISE 12. The support for this feature continues
to improve in the more recent release of ISE 13. Altera has promised this feature for their new
high-end devices, but this has not yet materialized. PR of FPGAs is a compelling design concept
for general purpose reconfigurable systems for its flexibility and extensibility.
Using the Xilinx tool chain, designers can go through the regular synthesis flow to generate
a single bitstream for programming the FPGA. This considers the device as a single atomic
entity. As opposed to the general synthesis flow, the PR flow physically divides the FPGA device
into regions. One region is called the “static region,” which is the portion of the device that is
programmed at startup and never changes. Another region is the “PR region,” which is the
portion of the device that will be reconfigured dynamically, potentially multiple times and with
different designs. It is possible to have multiple PR regions, but we will consider only the simplest
case here. The PR flow generates at least two bitstreams, one for the static and one for the
PR region. Most likely, there will be multiple PR bitstreams, one for each design that can be
dynamically loaded.
As shown in Fig. 2.3, the first step in implementing a system using the PR design flow is
the same as the regular design, which is to synthesize the netlists from the HDL sources that
will be used in the implementation and layout process. Note that the process requires separate
netlists for the static (top-level) designs and the PR partitions. A netlist must be generated for
each implementation of the PR partition used in the design. If the system design has multiple
2.2. PARTIAL RECONFIGURATION 19
PR partitions, then it will require a netlist for each implementation of each PR partition, even
if the logic is the same in multiple locations. Then once a netlist is done, we need to work on the
layout for each design to make sure that the netlist fits into the dedicated partition, and we need
to make sure that there are enough resources available for the design in each partition. Once
the implementation is done, we can then generate the bit file for each partition. At runtime,
we can dynamically swap different designs to a partition for the robot to adapt to the changing
environment. For more details on how to use PR on FPGAs, please refer to [57].
• a streaming engine implemented with a FIFO queue to buffer data between the con-
sumer and the producer to eliminate the handshake between the producer and the
consumer for each data transfer; and
• turn on the burst mode for ICAP thus it can fetch four words instead of one word at a
time.
We will explain this design in greater details in the following sections.
System bus
ICAP controller
SRAM controller
SRAM bridge
ICAP FSM
Secondary
Primary
DMA
DMA
FIFO
SRAM interface
ICAP
the SRAM Interface. Hence, there is no direct memory access between SRAM and ICAP, and
all configuration data transfers are done in software. In this way, the pipeline issues one read
instruction to fetch a configuration word from SRAM, and then issues a write instruction to
send the word to ICAP; instructions are also fetched from SRAM, and this process repeats
until the transfer process completes. This scheme is highly inefficient because the transfer of one
word requires tens of cycles, and the ICAP transfer throughput of this design is only 318 KB/s,
whereas on the product specification, the ideal ICAP throughput is 400 MB/s. Hence the out-
of-box design throughput is 1000 times worse than the ideal design.
Energy Efficiency
In [59], the authors indicate that the polarity of the FPGA hardware structures may significantly
impact leakage power consumption. Based on this observation, the authors of [60] tried to find
out whether FPGAs utilize this property such that when the blank bitstream is loaded to wipe
out an accelerator, the circuit is set to a state to minimize the leakage power consumption. In
order to achieve this, the authors implemented eight PR regions on an FPGA chip, with each
region occupying a configuration frame. These eight PR regions did not consume any dynamic
power, as the authors purposely gated off the clock to these regions. Then the authors used the
blank bitstream files to wipe out each of these regions and observed the chip power consumption
behavior. The results indicated that for every four configuration frames that we applied the blank
bitstream on, the chip power consumption dropped by a constant amount. This study confirms
that PR indeed leads to static power reduction and suggests that FPGAs may have utilized the
polarity property to minimize leakage power.
In addition, the authors of [60] studied whether PR can be used as an effective energy re-
duction technique in reconfigurable computing systems. To approach this problem, the authors
first identified the analytical models that capture the necessary conditions for energy reduc-
tion under different system configurations. The models show that increasing the configuration
throughput is a general and effective way to minimize the PR energy overhead. Therefore, the
authors designed and implemented a fully streaming DMA engine that nearly saturates the
configuration throughput.
The findings provide answers to the three questions: first, although we pay extra power to
use an accelerator, depending on the accelerator’s ability to accelerate the program execution, it
will result in actual energy reduction. The experimental results in [60] demonstrate that due to
its low power overhead and excellent ability of acceleration, having an acceleration extension can
lead to both program speedup and system energy reduction. Second, it is worthwhile to use PR
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 23
to reduce chip energy consumption if the energy reduction can make up for the energy overhead
incurred during the reconfiguration process; and the key to minimize the energy overhead during
the reconfiguration process is to maximize the configuration speed. The experimental results
in [60] confirm that enabling PR is a highly effective energy reduction technique. Finally, clock
gating is an effective technique in reducing energy consumption due to its negligible overhead;
however, it reduces only dynamic power whereas PR reduces both dynamic and static power.
Therefore, PR can lead to a larger energy reduction than clock gating, provided the extra energy
saving on static power elimination can make up for the energy overhead incurred during the
reconfiguration process.
Although the conventional wisdom is that PR is only useful if the accelerator would not
be used for a very long period of time, the experimental results in [60] indicate that with the
high configuration throughput delivered by the fast PR engine, PR can outperform clock gating
in energy reduction even if the accelerator inactive time is in the millisecond range. In summary,
based on the results from [58] and [60], we can conclude that PR is an effective technique for
improving both performance and energy efficiency, and it is the key feature that makes FPGAs
a highly attractive choice for dynamic robotic computing workloads.
• Thin: ROS is designed to be as thin as possible so that code written for ROS can be
used with other robot software frameworks.
• Easy testing: ROS has a built-in unit/integration test framework called rostest that
makes it easy to bring up and tear down test fixtures.
• Scaling: ROS is appropriate for large runtime systems and large development processes.
The Computation Graph is the peer-to-peer network of ROS processes that are processing
data together. The basic Computation Graph concepts of ROS are nodes, Master, Parameter
Server, messages, services, topics, and bags, all of which provide data to the Graph in different
ways.
• Nodes: nodes are processes that perform computation. ROS is designed to be modular
at a fine-grained scale; a robot control system usually comprises many nodes. Take
autonomous vehicles as an example, one node controls a laser range-finder, one node
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 25
controls the wheel motors, one node performs localization, one node performs path
planning, one node provides a graphical view of the system, and so on. A ROS node
is written with the use of a ROS client library, such as roscpp or rospy.
• Master: the ROS Master provides name registration and lookup to the rest of the
Computation Graph. Without the Master, nodes would not be able to find each other,
exchange messages, or invoke services.
• Parameter Server: the parameter server allows data to be stored by key in a central
location. It is currently part of the Master.
• Messages: nodes communicate with each other by passing messages. A message is
simply a data structure, comprising typed fields. Standard primitive types (integer,
floating-point, boolean, etc.) are supported, as are arrays of primitive types. Messages
can include arbitrarily nested structures and arrays (much like C structs).
• Topics: messages are routed via a transport system with publish-subscribe semantics.
A node sends out a message by publishing it to a given topic. The topic is a name that
is used to identify the content of the message. A node that is interested in a certain
kind of data will subscribe to the appropriate topic. There may be multiple concurrent
publishers and subscribers for a single topic, and a single node may publish and sub-
scribe to multiple topics. In general, publishers and subscribers are not aware of each
others’ existence. The idea is to decouple the production of information from its con-
sumption. Logically, one can think of a topic as a strongly typed message bus. Each
bus has a name, and anyone can connect to the bus to send or receive messages as long
as they are the right type.
• Services: the publish-subscribe model is a very flexible communication paradigm, but
its many-to-many, one-way transport is not appropriate for request-reply interactions,
which are often required in a distributed system. Request-reply is done via services,
which are defined by a pair of message structures: one for the request and one for the
reply. A providing node offers a service under a name and a client uses the service
by sending the request message and awaiting the reply. ROS client libraries generally
present this interaction to the programmer as if it were a remote procedure call.
• Bags: bags are a format for saving and playing back ROS message data. Bags are an
important mechanism for storing data, such as sensor data, that can be difficult to
collect but is necessary for developing and testing algorithms.
The ROS Master acts as a name service in the ROS Computation Graph. It stores topics
and services registration information for ROS nodes. Nodes communicate with the Master to
report their registration information. As these nodes communicate with the Master, they can re-
ceive information about other registered nodes and make connections as appropriate. The Master
26 2. FPGA TECHNOLOGIES
ROS-compliant FPGA component on ARM-FPGA SoC
FPGA FPGA
Input Output
interface interface
FPGA
Applications ROS
node
will also make callbacks to these nodes when this registration information changes, which allows
nodes to dynamically create connections as new nodes are run.
Nodes connect to other nodes directly; the Master only provides lookup information,
much like a domain name service (DNS) server. Nodes that subscribe to a topic will request
connections from nodes that publish that topic and will establish that connection over an agreed-
upon connection protocol. This architecture allows for decoupled operations, where the names
are the primary means by which larger and more complex systems can be built. Names have a
very important role in ROS: nodes, topics, services, and parameters all have names. Every ROS
client library supports command-line remapping of names, which means a compiled program
can be reconfigured at runtime to operate in a different Computation Graph topology.
• STEP (1): the Publisher and Subscriber nodes register their nodes and topic informa-
tion to the Master node. The registration is done by calling methods like registerPub-
lisher, hasParam, and so on, using XMLRPC [68].
• STEP (2): the Master node notifies topic information to the Subscriber nodes by call-
ing publisherUpdate (XMLRPC).
• STEP (3): the Subscriber node sends a connection request to the Publisher node by
using requestTopic (XMLRPC).
• STEP (4): the Publisher node returns IP address and port number, TCP connection
information for data communication, as a response to the requestTopic (XMLRPC).
• STEP (5): the Subscriber node establishes a TCP connection by using the information
and sends connection header to the TCP connection. Connection header contains im-
portant metadata about a connection being established, including typing and routing
information, using TCPROS [69].
• STEP (6): if it is a successful connection, the Publisher node sends connection header
(TCPROS).
• STEP (7): data transmission repeats. This data is written with little endian and header
information (4 bytes) is added to the data (TCPROS).
After this analysis, the authors found out that network packets that flowed in Pub-
lish/Subscribe messaging in the ROS system can be categorized into two parts, that is, the
registration part and the data transmission part. The registration part uses XMLRPC (STEPS
(1)–(4)), while the data transmission part uses TCPROS (STEPS (5)–(7)), which is almost
raw data of TCP communication with very small overhead. In addition, once data transmission
(STEP (7)) starts, only data transmission repeats without STEPS (1)–(6).
Based on the network packet analysis, the authors modified the server ports, such that
those used in XMLRPC and TCPROS are assigned differently. In addition, a client TCP/IP
connection of XMLRPC for the Master node is necessary for the Publisher node. For the
Subscriber node, two client TCP/IP connections of XMLRPC and one client connection of
TCPROS are necessary. Therefore, two or three TCP ports are necessary to implement Pub-
lish/Subscribe messaging. It is a problem to implement ROS nodes using the hardware TCP/IP
stack.
2.4. SUMMARY 29
To optimize the communication performance on ROS-compliant FPGAs, the authors
proposed hardware publication and subscription services. Conventionally, publication or sub-
scription of topics was done by software in ROS. By implementing these nodes as hardwired
circuits, direct communication between the ROS nodes and the FPGA becomes not only pos-
sible but also highly efficient. In order to implement the hardware ROS nodes, the authors
designed the Subscriber hardware and the Publisher hardware separately: the Subscriber hard-
ware is responsible to subscribe to a topic of another ROS node and to receive ROS messages
from the topic; whereas the Publisher hardware is responsible to publish ROS messages to a
topic of another ROS node. With this hardware-based design, the evaluation results indicate
that the latency of the Hardware ROS-compliant FPGA component can be cut to half, from
1.0 ms to 0.5 ms, thus effectively improving the communication between the FPGA accelerator
and other software-based ROS nodes.
2.4 SUMMARY
In this chapter, we have provided a general introduction to FPGA technologies, especially run-
time partial reconfiguration, which allows multiple robotic workloads to time-share an FPGA
at runtime. We also have introduced existing research on enabling ROS on FPGAs, which pro-
vides infrastructure supports for various robotic workloads to run directly on FPGAs. However,
the ecosystem of robotic computing on FPGAs is still in its infancy. For instance, due to the
lack of high-level synthesis tools for robotic accelerator design, accelerating robotic workloads,
or part of a robotic workload, on FPGAs still require extensive manual efforts. To make the
matter worse, most robotic engineers do not have sufficient FPGA background to develop an
FPGA-based accelerator, whereas few FPGA engineers possess sufficient robotic background
to fully understand a robotic system. Hence, to fully exploit the benefits of FPGAs, advanced
design automation tools are imperative to bridge this knowledge gap.
31
CHAPTER 3
investigation from software to hardware, from circuit level to system level, is carried out for
a complete analysis of FPGA-based deep learning accelerators and serves as a guide to future
work.
The first of these theories, that the beautiful is the true, we leave
entirely to the tender mercies of Mr Ruskin; we cannot gather from
his refutation to what class of theorists he is alluding. The remaining
three are, as we understand the matter, substantially one and the
same theory. We believe that no one, in these days, would define
beauty as solely resulting either from the apprehension of Utility,
(that is, the adjustment of parts to a whole, or the application of the
object to an ulterior purpose,) or to Familiarity and the affection
which custom engenders; but they would regard both Utility and
Familiarity as amongst the sources of those agreeable ideas or
impressions, which, by the great law of association, became
intimately connected with the visible object. We must listen,
however, to Mr Ruskin's refutation of them:—
Now this last sentence is sheer nonsense, and only proves that the
author had never given himself the trouble to understand the theory
he so flippantly discards. No one ever said that "association gives
pleasure;" but very many, and Mr Ruskin amongst the rest, have
said that associated thought adds its pleasure to an object pleasing
in itself, and thus increases the complex sentiment of beauty. That it
is a complex sentiment in all its higher forms, Mr Ruskin himself will
tell us. As to the manner in which he deals with Alison, it is in the
worst possible spirit of controversy. Alison was an elegant, but not a
very precise writer; it was the easiest thing in the world to select an
unfortunate illustration, and to convict that of absurdity. Yet he
might with equal ease have selected many other illustrations from
Alison, which would have done justice to the theory he expounds. A
hundred such will immediately occur to the reader. If, instead of a
historical recollection of this kind, which could hardly make the
stream itself of Runnymede look more beautiful, Alison had confined
himself to those impressions which the generality of mankind receive
from river scenery, he would have had no difficulty in showing (as
we believe he has elsewhere done) how, in this case, ideas gathered
from different sources flow into one harmonious and apparently
simple feeling. That sentiment of beauty which arises as we look
upon a river will be acknowledged by most persons to be composed
of many associated thoughts, combining with the object before
them. Its form and colour, its bright surface and its green banks, are
all that the eye immediately gives us; but with these are combined
the remembered coolness of the fluent stream, and of the breeze
above it, and of the pleasant shade of its banks; and beside all this—
as there are few persons who have not escaped with delight from
town or village, to wander by the quiet banks of some neighbouring
stream, so there are few persons who do not associate with river
scenery ideas of peace and serenity. Now many of these thoughts or
facts are such as the eye does not take cognisance of, yet they
present themselves as instantaneously as the visible form, and so
blended as to seem, for the moment, to belong to it.
Why not have selected some such illustration as this, instead of
the unfortunate Runnymede, from a work where so many abound as
apt as they are elegantly expressed? As to Mr Ruskin's utilitarian
philosopher, it is a fabulous creature—no such being exists. Nor need
we detain ourselves with the quite departmental subject of
Familiarity. But let us endeavour—without desiring to pledge
ourselves or our readers to its final adoption—to relieve the theory
of association of ideas from the obscurity our author has thrown
around it. Our readers will not find that this is altogether a wasted
labour.
With Mr Ruskin we are of opinion that, in a discussion of this kind,
the term Beauty ought to be limited to the impression derived,
mediately or immediately, from the visible object. It would be
useless affectation to attempt to restrict the use of the word, in
general, to this application. We can have no objection to the term
Beautiful being applied to a piece of music, or to an eloquent
composition, prose or verse, or even to our moral feelings and heroic
actions; the word has received this general application, and there is,
at basis, a great deal in common between all these and the
sentiment of beauty attendant on the visible object. For music, or
sweet sounds, and poetry, and our moral feelings, have much to do
(through the law of association) with our sentiment of the Beautiful.
It is quite enough if, speaking of the subject of our analysis, we limit
it to those impressions, however originated, which attend upon the
visible object.
One preliminary word on this association of ideas. It is from its
very nature, and the nature of human life, of all degrees of intimacy
—from the casual suggestion, or the case where the two ideas are at
all times felt to be distinct, to those close combinations where the
two ideas have apparently coalesced into one, or require an
attentive analysis to separate them. You see a mass of iron; you may
be said to see its weight, the impression of its weight is so intimately
combined with its form. The light of the sun, and the heat of the sun
are learnt from different senses, yet we never see the one without
thinking of the other, and the reflection of the sunbeam seen upon a
bank immediately suggests the idea of warmth. But it is not
necessary that the combination should be always so perfect as in
this instance, in order to produce the effect we speak of under the
name of Association of Ideas. It is hardly possible for us to abstract
the glow of the sunbeam from its light; but the fertility which follows
upon the presence of the sun, though a suggestion which habitually
occurs to reflective minds, is an association of a far less intimate
nature. It is sufficiently intimate, however, to blend with that feeling
of admiration we have when we speak of the beauty of the sun.
There is the golden harvest in its summer beams. Again, the
contemplative spirit in all ages has formed an association between
the sun and the Deity, whether as the fittest symbol of God, or as
being His greatest gift to man. Here we have an association still
more refined, and of a somewhat less frequent character, but one
which will be found to enter, in a very subtle manner, into that
impression we receive from the great luminary.
And thus it is that, in different minds, the same materials of
thought may be combined in a closer or laxer relationship. This
should be borne in mind by the candid inquirer. That in many
instances ideas from different sources do coalesce, in the manner
we have been describing, he cannot for an instant doubt. He seems
to see the coolness of that river; he seems to see the warmth on
that sunny bank. In many instances, however, he must make
allowance for the different habitudes of life. The same illustration will
not always have the same force to all men. Those who have
cultivated their minds by different pursuits, or lived amongst scenery
of a different character, cannot have formed exactly the same moral
association with external nature.
These preliminaries being adjusted, what, we ask, is that first
original charm of the visible object which serves as the foundation
for this wonderful superstructure of the Beautiful, to which almost
every department of feeling and of thought will be found to bring its
contribution? What is it so pleasurable that the eye at once receives
from the external world, that round it should have gathered all these
tributary pleasures? Light—colour—form; but, in reference to our
discussion, pre-eminently the exquisite pleasure derived from the
sense of light, pure or coloured. Colour, from infancy to old age, is
one original, universal, perpetual source of delight, the first and
constant element of the Beautiful.
We are far from thinking that the eye does not at once take
cognisance of form as well as colour. Some ingenious analysts have
supposed that the sensation of colour is, in its origin, a mere mental
affection, having no reference to space or external objects, and that
it obtains this reference through the contemporaneous acquisition of
the sense of touch. But there can be no more reason for supposing
that the sense of touch informs us immediately of an external world
than that the sense of colour does. If we do not allow to all the
senses an intuitive reference to the external world, we shall get it
from none of them. Dr Brown, who paid particular attention to this
subject, and who was desirous to limit the first intimation of the
sense of sight to an abstract sensation of unlocalised colour, failed
entirely in his attempt to obtain from any other source the idea of
space or outness; Kant would have given him certain subjective
forms of the sensitive faculty, space and time. These he did not like:
he saw that, if he denied to the eye an immediate perception of the
external world, he must also deny it to the touch; he therefore
prayed in aid certain muscular sensations from which the idea of
resistance would be obtained. But it seems to us evident that not till
after we have acquired a knowledge of the external world can we
connect volition with muscular movement, and that, until that
connection is made, the muscular sensations stand in the same
predicament as other sensations, and could give him no aid in
solving his problem. We cannot go further into this matter at
present.[6] The mere flash of light which follows the touch upon the
optic nerve represents itself as something without; nor was colour,
we imagine, ever felt, but under some form more or less distinct;
although in the human being the eye seems to depend on the touch
far more than in other animals, for its further instruction.
But although the eye is cognisant of form as well as colour, it is in
the sensation of colour that we must seek the primitive pleasure
derived from this organ. And probably the first reason why form
pleases is this, that the boundaries of form are also the lines of
contrast of colour. It is a general law of all sensation that, if it be
continued, our susceptibility to it declines. It was necessary that the
eye should be always open. Its susceptibility is sustained by the
perpetual contrast of colours. Whether the contrast is sudden, or
whether one hue shades gradually into another, we see here an
original and primary source of pleasure. A constant variety, in some
way produced, is essential to the maintenance of the pleasure
derived from colour.
It is not incumbent on us to inquire how far the beauty of form
may be traceable to the sensation of touch;—a very small portion of
it we suspect. In the human countenance, and in sculpture, the
beauty of form is almost resolvable into expression; though possibly
the soft and rounded outline may in some measure be associated
with the sense of smoothness to the touch. All that we are
concerned to show is, that there is here in colour, diffused as it is
over the whole world, and perpetually varied, a beauty at once
showered upon the visible object. We hear it said, if you resolve all
into association, where will you begin? You have but a circle of
feelings. If moral sentiment, for instance, be not itself the beautiful,
why should it become so by association. There must be something
else that is the beautiful, by association with which it passes for
such. We answer, that we do not resolve all into association; that we
have in this one gift of colour, shed so bountifully over the whole
world, an original beauty, a delight which makes the external object
pleasant and beloved; for how can we fail, in some sort, to love
what produces so much pleasure?
We are at a loss to understand how any one can speak with
disparagement of colour as a source of the beautiful. The sculptor
may, perhaps, by his peculiar education, grow comparatively
indifferent to it: we know not how this may be; but let any man, of
the most refined taste imaginable, think what he owes to this
source, when he walks out at evening, and sees the sun set
amongst the hills. The same concave sky, the same scene, so far as
its form is concerned, was there a few hours before, and saddened
him with its gloom; one leaden hue prevailed over all; and now in a
clear sky the sun is setting, and the hills are purple, and the clouds
are radiant with every colour that can be extracted from the
sunbeam. He can hardly believe that it is the same scene, or he the
same man. Here the grown-up man and the child stand always on
the same level. As to the infant, note how its eye feeds upon a
brilliant colour, or the living flame. If it had wings, it would assuredly
do as the moth does. And take the most untutored rustic, let him be
old, and dull, and stupid, yet, as long as the eye has vitality in it, will
he look up with long untiring gaze at this blue vault of the sky,
traversed by its glittering clouds, and pierced by the tall green trees
around him.
Is it any marvel now that round the visible object should associate
tributary feelings of pleasure? How many pleasing and tender
sentiments gather round the rose! Yet the rose is beautiful in itself.
It was beautiful to the child by its colour, its texture, its softly-
shaded leaf, and the contrast between the flower and the foliage.
Love, and poetry, and the tender regrets of advanced life, have
contributed a second dower of beauty. The rose is more to the youth
and to the old man than it was to the child; but still to the last they
both feel the pleasure of the child.
The more commonplace the illustration, the more suited it is to
our purpose. If any one will reflect on the many ideas that cluster
round this beautiful flower, he will not fail to see how numerous and
subtle may be the association formed with the visible object. Even
an idea painful in itself may, by way of contrast, serve to heighten
the pleasure of others with which it is associated. Here the thought
of decay and fragility, like a discord amongst harmonies, increases
our sentiment of tenderness. We express, we believe, the prevailing
taste when we say that there is nothing, in the shape of art, so
disagreeable and repulsive as artificial flowers. The waxen flower
may be an admirable imitation, but it is a detestable thing. This
partly results from the nature of the imitation; a vulgar deception is
often practised upon us: what is not a flower is intended to pass for
one. But it is owing still more, we think, to the contradiction that is
immediately afterwards felt between this preserved and imperishable
waxen flower, and the transitory and perishable rose. It is the nature
of the rose to bud, and blossom, and decay; it gives its beauty to
the breeze and to the shower; it is mortal; it is ours; it bears our
hopes, our loves, our regrets. This waxen substitute, that cannot
change or decay, is a contradiction and a disgust.
Amongst objects of man's contrivance, the sail seen upon the calm
waters of a lake or a river is universally felt to be beautiful. The form
is graceful, and the movement gentle, and its colour contrasts well
either with the shore or the water. But perhaps the chief element of
our pleasure is all association with human life, with peaceful
enjoyment—
"This quiet sail is as a noiseless wing,
To waft me from distraction."
ebookbell.com