0% found this document useful (0 votes)
13 views78 pages

Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu Download

The document is a promotional overview of the book 'Robotic Computing on FPGAs' by Shaoshan Liu and others, which discusses FPGA-based robotic computing accelerator designs and their applications in various robotic tasks. It covers topics such as perception, localization, planning, and multi-robot collaboration, along with real-world implementations in autonomous vehicles and space exploration. The book is part of the Synthesis Lectures on Computer Architecture series and aims to provide concise presentations on important research topics in the field.

Uploaded by

jbvpqbtv709
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views78 pages

Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu Download

The document is a promotional overview of the book 'Robotic Computing on FPGAs' by Shaoshan Liu and others, which discusses FPGA-based robotic computing accelerator designs and their applications in various robotic tasks. It covers topics such as perception, localization, planning, and multi-robot collaboration, along with real-world implementations in autonomous vehicles and space exploration. The book is part of the Synthesis Lectures on Computer Architecture series and aims to provide concise presentations on important research topics in the field.

Uploaded by

jbvpqbtv709
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Robotic Computing On Fpgas Synthesis Lectures On

Distributed Computing Theory Shaoshan Liu


download

https://ebookbell.com/product/robotic-computing-on-fpgas-
synthesis-lectures-on-distributed-computing-theory-shaoshan-
liu-33377966

Explore and download more ebooks at ebookbell.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Autonomous Robotic Systems Soft Computing And Hard Computing


Methodologies And Applications 1st Edition J Mira

https://ebookbell.com/product/autonomous-robotic-systems-soft-
computing-and-hard-computing-methodologies-and-applications-1st-
edition-j-mira-4189056

Human Communication Technology Internetofroboticthings And Ubiquitous


Computing 1st Edition Anandan R

https://ebookbell.com/product/human-communication-technology-
internetofroboticthings-and-ubiquitous-computing-1st-edition-
anandan-r-35849040

Soft Computing In Advanced Robotics 1st Edition Yongtae Kim

https://ebookbell.com/product/soft-computing-in-advanced-robotics-1st-
edition-yongtae-kim-4662850

Soft Computing For Intelligent Control And Mobile Robotics 1st Edition
Ramn Zatarain

https://ebookbell.com/product/soft-computing-for-intelligent-control-
and-mobile-robotics-1st-edition-ramn-zatarain-4194888
Geometric Computing With Clifford Algebras Theoretical Foundations And
Applications In Computer Vision And Robotics Softcover Reprint Of
Hardcover 1st Ed 2001 Editorgerald Sommer

https://ebookbell.com/product/geometric-computing-with-clifford-
algebras-theoretical-foundations-and-applications-in-computer-vision-
and-robotics-softcover-reprint-of-hardcover-1st-ed-2001-editorgerald-
sommer-54790386

Wavelets In Soft Computing World Scientific Series In Robotics And


Intelligent Systems 25 Marc Thuillard

https://ebookbell.com/product/wavelets-in-soft-computing-world-
scientific-series-in-robotics-and-intelligent-systems-25-marc-
thuillard-2170742

Aspects Of Soft Computing Intelligent Robotics And Control 1st Edition


Endre Pap Auth

https://ebookbell.com/product/aspects-of-soft-computing-intelligent-
robotics-and-control-1st-edition-endre-pap-auth-4193848

Computational Surgery And Dual Training Computing Robotics And Imaging


B L Bass

https://ebookbell.com/product/computational-surgery-and-dual-training-
computing-robotics-and-imaging-b-l-bass-4593808

Advances In Soft Computing Intelligent Robotics And Control 1st


Edition Jnos Fodor

https://ebookbell.com/product/advances-in-soft-computing-intelligent-
robotics-and-control-1st-edition-jnos-fodor-4662878
Series ISSN: 1935-3235

LIU • ET AL
Synthesis Lectures on
Computer Architecture

Series Editor: Natalie Enright Jerger, University of Toronto


Robotic Computing on FPGAs
Shaoshan Liu, PerceptIn
Zishen Wan, Georgia Institute of Technology
Bo Yu, PerceptIn
Yu Wang, Tsinghua University

ROBOTIC COMPUTING ON FPGAS MORGAN & CLAYPOOL


This book provides a thorough overview of the state-of-the-art field-programmable gate
array (FPGA)-based robotic computing accelerator designs and summarizes their adopted
optimized techniques. This book consists of ten chapters, delving into the details of how
FPGAs have been utilized in robotic perception, localization, planning, and multi-robot
collaboration tasks. In addition to individual robotic tasks, this book provides detailed
descriptions of how FPGAs have been used in robotic products, including commercial
autonomous vehicles and space exploration robots.

About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
books provide concise, original presentations of important research and
development topics, published quickly, in digital and print formats.

Synthesis Lectures on
Computer Architecture
store.morganclaypool.com
Natalie Enright Jerger, Series Editor
Robotic Computing
on FPGAs
Synthesis Lectures on
Computer Architecture
Editor
Natalie Enright Jerger, University of Toronto
Editor Emerita
Margaret Martonosi, Princeton University
Founding Editor Emeritus
Mark D. Hill, University of Wisconsin, Madison
Synthesis Lectures on Computer Architecture publishes 50- to 100-page books on topics pertaining to
the science and art of designing, analyzing, selecting, and interconnecting hardware components to
create computers that meet functional, performance, and cost goals. The scope will largely follow
the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and
ASPLOS.
Robotic Computing on FPGAs
Shaoshan Liu, Zishen Wan, Bo Yu, and Yu Wang
2021
AI for Computer Architecture: Principles, Practice, and Prospects
Lizhong Chen, Drew Penney, and Daniel Jiménez
2020
Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale
Production
Andres Rodriguez
2020
Parallel Processing, 1980 to 2020
Robert Kuhn and David Padua
2020
Data Orchestration in Deep Learning Accelerators
Tushar Krishna, Hyoukjun Kwon, Angshuman Parashar, Michael Pellauer, and Ananda Samajdar
2020

Efficient Processing of Deep Neural Networks


Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer
2020
iii

Quantum Computer System: Research for Noisy Intermediate-Scale Quantum


Computers
Yongshan Ding and Frederic T. Chong
2020

A Primer on Memory Consistency and Cache Coherence, Second Edition


Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David Wood
2020

Innovations in the Memory System


Rajeev Balasubramonian
2019

Cache Replacement Policies


Akanksha Jain and Calvin Lin
2019

The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition


Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan
2018

Principles of Secure Processor Architecture Design


Jakub Szefer
2018

General-Purpose Graphics Processor Architectures


Tor M. Aamodt, Wilson Wai Lun Fung, and Timothy G. Rogers
2018

Compiling Algorithms for Heterogenous Systems


Steven Bell, Jing Pu, James Hegarty, and Mark Horowitz
2018

Architectural and Operating System Support for Virtual Memory


Abhishek Bhattacharjee and Daniel Lustig
2017

Deep Learning for Computer Architects


Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks
2017

On-Chip Networks, Second Edition


Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh
2017
iv
Space-Time Computing with Temporal Neural Networks
James E. Smith
2017

Hardware and Software Support for Virtualization


Edouard Bugnion, Jason Nieh, and Dan Tsafrir
2017

Datacenter Design and Management: A Computer Architect’s Perspective


Benjamin C. Lee
2016

A Primer on Compression in the Memory Hierarchy


Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood
2015

Research Infrastructures for Hardware Accelerators


Yakun Sophia Shao and David Brooks
2015

Analyzing Analytics
Rajesh Bordawekar, Bob Blainey, and Ruchir Puri
2015

Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao
2015

Die-stacking Architecture
Yuan Xie and Jishen Zhao
2015

Single-Instruction Multiple-Data Execution


Christopher J. Hughes
2015

Power-Efficient Computer Architectures: Recent Advances


Magnus Själander, Margaret Martonosi, and Stefanos Kaxiras
2014

FPGA-Accelerated Simulation of Computer Systems


Hari Angepat, Derek Chiou, Eric S. Chung, and James C. Hoe
2014

A Primer on Hardware Prefetching


Babak Falsafi and Thomas F. Wenisch
2014
v
On-Chip Photonic Interconnects: A Computer Architect’s Perspective
Christopher J. Nitta, Matthew K. Farrens, and Venkatesh Akella
2013

Optimization and Mathematical Modeling in Computer Architecture


Tony Nowatzki, Michael Ferris, Karthikeyan Sankaralingam, Cristian Estan, Nilay Vaish, and
David Wood
2013

Security Basics for Computer Architects


Ruby B. Lee
2013

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale


Machines, Second Edition
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle
2013

Shared-Memory Synchronization
Michael L. Scott
2013

Resilient Architecture Design for Voltage Variation


Vijay Janapa Reddi and Meeta Sharma Gupta
2013

Multithreading Architecture
Mario Nemirovsky and Dean M. Tullsen
2013

Performance Analysis and Tuning for General Purpose Graphics Processing Units
(GPGPU)
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu
2012

Automatic Parallelization: An Overview of Fundamental Compiler Techniques


Samuel P. Midkiff
2012

Phase Change Memory: From Devices to Systems


Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran
2011

Multi-Core Cache Hierarchies


Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar
2011
vi
A Primer on Memory Consistency and Cache Coherence
Daniel J. Sorin, Mark D. Hill, and David A. Wood
2011

Dynamic Binary Modification: Tools, Techniques, and Applications


Kim Hazelwood
2011

Quantum Computing for Computer Architects, Second Edition


Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong
2011

High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities


Dennis Abts and John Kim
2011

Processor Microarchitecture: An Implementation Perspective


Antonio González, Fernando Latorre, and Grigorios Magklis
2010

Transactional Memory, Second Edition


Tim Harris, James Larus, and Ravi Rajwar
2010

Computer Architecture Performance Evaluation Methods


Lieven Eeckhout
2010

Introduction to Reconfigurable Supercomputing


Marco Lanzagorta, Stephen Bique, and Robert Rosenberg
2009

On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009

The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
Bruce Jacob
2009

Fault Tolerant Computer Architecture


Daniel J. Sorin
2009

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale


Machines
Luiz André Barroso and Urs Hölzle
2009
vii
Computer Architecture Techniques for Power-Efficiency
Stefanos Kaxiras and Margaret Martonosi
2008

Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency


Kunle Olukotun, Lance Hammond, and James Laudon
2007

Transactional Memory
James R. Larus and Ravi Rajwar
2006

Quantum Computing for Computer Architects


Tzvetan S. Metodi and Frederic T. Chong
2006
Copyright © 2021 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.

Robotic Computing on FPGAs


Shaoshan Liu, Zishen Wan, Bo Yu, and Yu Wang
www.morganclaypool.com

ISBN: 9781636391656 paperback


ISBN: 9781636391663 ebook
ISBN: 9781636391670 hardcover

DOI 10.2200/S01101ED1V01Y202105CAC056

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE

Lecture #56
Series Editor: Natalie Enright Jerger, University of Toronto
Editor Emerita: Margaret Martonosi, Princeton University
Founding Editor Emeritus: Mark D. Hill, University of Wisconsin, Madison
Series ISSN
Print 1935-3235 Electronic 1935-3243
Robotic Computing
on FPGAs

Shaoshan Liu
PerceptIn

Zishen Wan
Georgia Institute of Technology

Bo Yu
PerceptIn

Yu Wang
Tsinghua University

SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #56

M
&C Morgan & cLaypool publishers
ABSTRACT
This book provides a thorough overview of the state-of-the-art field-programmable gate array
(FPGA)-based robotic computing accelerator designs and summarizes their adopted optimized
techniques. This book consists of ten chapters, delving into the details of how FPGAs have been
utilized in robotic perception, localization, planning, and multi-robot collaboration tasks. In
addition to individual robotic tasks, this book provides detailed descriptions of how FPGAs have
been used in robotic products, including commercial autonomous vehicles and space exploration
robots.

KEYWORDS
robotics, FPGAs, autonomous machines, perception, localization, planning, con-
trol, space exploration, deep learning
xi

Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1 Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Planning and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 FPGAs in Robotic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 The Deep Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 An Introduction to FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Types of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Commercial Applications of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 What is Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 How to Use Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Achieving High Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 Real-World Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Robot Operating System (ROS) on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Robot Operating System (ROS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 ROS-Compliant FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Optimizing Communication Latency for the ROS-Compliant
FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Perception on FPGAs – Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


3.1 Why Choose FPGAs for Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Preliminary: Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
xii
3.3 Design Methodology and Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Hardware-Oriented Model Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Data Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.2 Weight Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Hardware Design: Efficient Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.1 Computation Unit Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.2 Loop Unrolling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Perception on FPGAs – Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


4.1 Perception in Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Stereo Vision in Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Local Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Global Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Semi-Global Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Efficient Large-Scale Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . 63
4.6.1 ELAS Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7.1 Dataset and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7.2 Power and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Localization on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
xiii
5.3 Frontend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.2 Exploiting Task-Level Parallelisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Backend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.2 Resource Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6 Planning on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1 Motion Planning Context Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1.1 Probabilistic Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.1.2 Rapidly Exploring Random Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Collision Detection on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2.1 Motion Planning Compute Time Profiling . . . . . . . . . . . . . . . . . . . . . 94
6.2.2 General Purpose Processor-Based Solutions . . . . . . . . . . . . . . . . . . . . 95
6.2.3 Specialized Hardware Accelerator-Based Solutions . . . . . . . . . . . . . . 97
6.2.4 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3 Graph Search on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7 Multi-Robot Collaboration on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109


7.1 Multi-Robot Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 INCAME Framework for Multi-Task on FPGAs . . . . . . . . . . . . . . . . . . . . . 113
7.2.1 Hardware Resource Conflicts in ROS . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2.2 Interruptible Accelerator with ROS (INCAME) . . . . . . . . . . . . . . . 115
7.3 Virtual Instruction-Based Accelerator Interrupt . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.1 Instruction Driven Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.2 How to Interrupt: Virtual Instruction . . . . . . . . . . . . . . . . . . . . . . . . 119
7.3.3 Where to Interrupt: After SAVE/CALC_F . . . . . . . . . . . . . . . . . . . 121
7.3.4 Latency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3.5 Virtual Instruction ISA (VI-ISA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.3.6 Instruction Arrangement Unit (IAU) . . . . . . . . . . . . . . . . . . . . . . . . 125
7.3.7 Example of Virtual Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.4 Evaluation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
xiv
7.4.2 Virtual Instruction-Based Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4.3 ROS-Based MR-Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

8 Autonomous Vehicles Powered by FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133


8.1 The PerceptIn Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.2 Design Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.2.1 Overview of the Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.2.2 Performance Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.2.3 Energy and Cost Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.3 Software Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.4 On Vehicle Processing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.4.1 Hardware Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.4.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.4.3 Sensor Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.4.4 Performance Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

9 Space Robots Powered by FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149


9.1 Radiation Tolerance for Space Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.2 Space Robotic Algorithm Acceleration on FPGAs . . . . . . . . . . . . . . . . . . . . 151
9.2.1 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
9.2.2 Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.3 Utilization of FPGAs in Space Robotic Missions . . . . . . . . . . . . . . . . . . . . . 154
9.3.1 Mars Exploration Rover Missions . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.3.2 Mars Science Laboratory Mission . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.3.3 Mars 2020 Mission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.1 What we Have Covered in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.2 Looking Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201


xv

Preface
In this book, we provide a thorough overview of the state-of-the-art FPGA-based robotic com-
puting accelerator designs and summarize their adopted optimized techniques. The authors
combined have over 40 years of research experiences of utilizing FPGAs in robotic applications,
both in academic research and commercial deployments. For instance, the authors have demon-
strated that, by co-designing both the software and hardware, FPGAs can achieve more than 10×
better performance and energy efficiency compared to the CPU and GPU implementations. The
authors have also pioneered the utilization of the partial reconfiguration methodology in FPGA
implementations to further improve the design flexibility and reduce the overhead. In addition,
the authors have successfully developed and shipped commercial robotic products powered by
FPGAs and the authors demonstrate that FPGAs have excellent potential and are promising
candidates for robotic computing acceleration due to its high reliability, adaptability, and power
efficiency.
The authors believe that FPGAs are the best compute substrate for robotic applications
for several reasons. First, robotic algorithms are still evolving rapidly, and thus any ASIC-based
accelerators will be months or even years behind the state-of-the-art algorithms. On the other
hand, FPGAs can be dynamically updated as needed. Second, robotic workloads are highly di-
verse, thus it is difficult for any ASIC-based robotic computing accelerator to reach economies
of scale in the near future. On the other hand, FPGAs are a cost effective and energy-effective
alternative before one type of accelerator reaches economies of scale. Third, compared to sys-
tems on a chip (SoCs) that have reached economies of scale, e.g., mobile SoCs, FPGAs deliver a
significant performance advantage. Fourth, partial reconfiguration allows multiple robotic work-
loads to time-share an FPGA, thus allowing one chip to serve multiple applications, leading to
overall cost and energy reduction.
Specifically, FPGAs require little power and are often built into small systems with less
memory. They have the ability of massively parallel computations and to make use of the prop-
erties of perception (e.g., stereo matching), localization (e.g., simultaneous localization and
mapping (SLAM)), and planning (e.g., graph search) kernels to remove additional logic so
as to simplify the end-to-end system implementation. Taking into account hardware charac-
teristics, several algorithms are proposed which can be run in a hardware-friendly way and
achieve similar software performance. Therefore, FPGAs are possible to meet real-time require-
ments while achieving high energy efficiency compared to central processing units (CPUs) and
graphics processing units (GPUs). In addition, unlike the application-specific integrated circuit
(ASIC) counterparts, FPGA technologies provide the flexibility of on-site programming and
re-programming without going through re-fabrication with a modified design. Partial Recon-
xvi PREFACE
figuration (PR) takes this flexibility one step further, allowing the modification of an operating
FPGA design by loading a partial configuration file. Using PR, part of the FPGA can be recon-
figured at runtime without compromising the integrity of the applications running on those parts
of the device that are not being reconfigured. As a result, PR can allow different robotic applica-
tions to time-share part of an FPGA, leading to energy and performance efficiency, and making
FPGA a suitable computing platform for dynamic and complex robotic workloads. Due to the
advantages over other compute substrates, FPGAs have been successfully utilized in commercial
autonomous vehicles as well as in space robotic applications, for FPGAs offer unprecedented
flexibility and significantly reduced the design cycle and development cost.
This book consists of ten chapters, providing a thorough overview of how FPGAs have
been utilized in robotic perception, localization, planning, and multi-robot collaboration tasks.
In addition to individual robotic tasks, we provide detailed descriptions of how FPGAs have
been used in robotic products, including commercial autonomous vehicles and space exploration
robots.

Shaoshan Liu
June 2021
1

CHAPTER 1

Introduction and Overview


The last decade has seen significant progress in the development of robotics, spanning from al-
gorithms, mechanics to hardware platforms. Various robotic systems, like manipulators, legged
robots, unmanned aerial vehicles, and self-driving cars have been designed for search and res-
cue [1, 2], exploration [3, 4], package delivery [5], entertainment [6, 7], and more applications
and scenarios. These robots are on the rise of demonstrating their full potential. Take drones,
a type of aerial robot, as an example. The number of drones has grown by 2.83x between 2015
and 2019 based on the U.S. Federal Aviation Administration (FAA) report [8]. The registered
number reached 1.32 million in 2019, and the FFA expects this number will grow to 1.59 billion
by 2024.
However, robotic systems are very complex [9–12]. They tightly integrate many tech-
nologies and algorithms, including sensing, perception, mapping, localization, decision making,
control, etc. This complexity poses many challenges for the design of robotic edge computing
systems [13, 14]. On the one hand, robotic systems need to process an enormous amount of
data in real-time. The incoming data often comes from multiple sensors, which is highly het-
erogeneous and requires accurate spatial and temporal synchronization and pre-processing [15].
However, the robotic system usually has limited on-board resources, such as memory storage,
bandwidth, and compute capabilities, making it hard to meet the real-time requirements. On
the other hand, the current state-of-the-art robotic system usually has strict power constraints
on the edge that cannot support the amount of computation required for performing tasks, such
as 3D sensing, localization, navigation, and path planning. Therefore, the computation and stor-
age complexity, as well as real-time and power constraints of the robotic system, hinder its wide
application in latency-critical or power-limited scenarios [16].
Therefore, it is essential to choose a proper compute platform for robotic systems. CPUs
and GPUs are two widely used commercial compute platforms. CPUs are designed to handle
a wide range of tasks quickly and are often used to develop novel algorithms. A typical CPU
can achieve 10–100 GFLOPS with below 1 GOP/J power efficiency [17]. In contrast, GPUs
are designed with thousands of processor cores running simultaneously, which enable massive
parallelism. A typical GPU can perform up to 10 TOPS performance and become a good can-
didate for high-performance scenarios. Recently, benefiting in part from the better accessibility
provided by CUDA/OpenCL, GPUs have been predominantly used in many robotic applica-
tions. However, conventional CPUs and GPUs usually consume 10–100 W of power, which
are orders of magnitude higher than what is available on the resource-limited robotic system.
2 1. INTRODUCTION AND OVERVIEW
Besides CPUs and GPUs, FPGAs are attracting attention and becoming compute sub-
strate candidates to achieve energy-efficient robotic task processing. FPGAs require low power
and are often built into small systems with less memory. They have the ability to process mas-
sively parallel computations and to make use of the properties of perception (e.g., stereo match-
ing), localization (e.g., SLAM), and planning (e.g., graph search) kernels to remove additional
logic and simplify the implementation. Taking into account hardware characteristics, researchers
and engineers have proposed several algorithms that can be run in a hardware-friendly way and
achieve similar software performance. Therefore, FPGAs are possible to meet real-time require-
ments while achieving high energy efficiency compared to CPUs and GPUs.
Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and
re-programming without going through re-fabrication with a modified design. Partial Recon-
figuration (PR) takes this flexibility one step further, allowing the modification of an operating
FPGA design by loading a partial configuration file. By using PR, part of the FPGA can be
reconfigured at runtime without compromising the integrity of the applications running on the
parts of the device that are not being reconfigured. As a result, PR can allow different robotic
applications to time-share part of an FPGA, leading to energy and performance efficiency, and
making FPGA a suitable computing platform for dynamic and complex robotic workloads.
Note that robotics is not one technology but rather an integration of many technologies.
As shown in Fig. 1.1, the stack of the robotic system consists of three major components: ap-
plication workloads, including sensing, perception, localization, motion planning, and control;
a software edge subsystem, including operating system and runtime layer; and computing hard-
ware, including both micro-controllers and companion computers [16, 18, 19].
We focus on the robotic application workloads in this chapter. The application subsystem
contains multiple algorithms that are used by the robot to extract meaningful information from
raw sensor data to understand the environment and dynamically make decisions about its actions.

1.1 SENSING

The sensing stage is responsible for extracting meaningful information from the sensor raw data.
To enable intelligent actions and improve reliability, the robot platform usually supports a wide
range of sensors. The number and type of sensors are heavily dependent on the specifications of
the workload and the capability of the onboard compute platform. The sensors can include the
following:
Cameras. Cameras are usually used for object recognition and object tracking, such as
lane detection in autonomous vehicles and obstacle detection in drones, etc. RGB-D camera
can also be utilized to determine object distances and positions. Take autonomous vehicles as
an example, the current system usually mounts eight or more 1080p cameras around the vehicle
to detect, recognize and track objects in different directions, which can greatly improve safety.
1.1. SENSING 3
Usually, these cameras run at 60 Hz, which will process about multiple gigabytes of raw data
per second when combined.
GNSS/IMU. The global navigation satellite system (GNSS) and inertial measurement
unit (IMU) system help the robot localize itself by reporting both inertial updates and an es-
timate of the global location at a high rate. Different robots have different requirements for
localization sensing. For instance, 10 Hz may be enough for a low-speed mobile robot, but
high-speed autonomous vehicles usually demand 30 Hz or higher for localization, and high-
speed drones may need 100 Hz or more for localization, thus we are facing a wide spectrum of
sensing speeds. Fortunately, different sensors have their advantages and drawbacks. GNSS can
enable fairly accurate localization, while it runs at only 10 Hz, thus unable to provide real-time
updates. By contrast, both accelerometer and gyroscope in IMU can run at 100–200 Hz, which
can satisfy the real-time requirement. However, IMU suffers bias wandering over time or per-
turbation by some thermo-mechanical noise, which may lead to an accuracy degradation in the
position estimates. By combining GNSS and IMU, we can get accurate and real-time updates
for robots.
LiDAR. Light detection and ranging (LiDAR) is used for evaluating distance by illu-
minating the obstacles with laser light and measuring the reflection time. These pulses, along
with other recorded data, can generate precise and three-dimensional information about the
surrounding characteristics. LiDAR plays an important role in localization, obstacle detection,
and avoidance. As indicated in [20], the choice of sensors dictates the algorithm and hardware
design. Take autonomous driving as an instance, almost all the autonomous vehicle companies
use LiDAR at the core of their technologies. Examples include Uber, Waymo, and Baidu. Per-
ceptIn and Tesla are among the very few that do not use LiDAR and, instead, rely on cameras
and vision-based systems. In particular, PerceptIn’s data demonstrated that for the low-speed
autonomous driving scenario, LiDAR processing is slower than camera-based vision processing,
but increases the power consumption and cost.
Radar and Sonar. The Radio Detection and Ranging (Radar) and Sound Navigation and
Ranging (Sonar) system is used to determine the distance and speed to a certain object, which
usually serves as the last line of defense to avoid obstacles. Take autonomous vehicles as an ex-
ample, a danger of collision may occur when near obstacles are detected, then the vehicle will
apply brakes or turn to avoid obstacles. Compared to LiDAR, the Radar and Sonar system is
cheaper and smaller, and their raw data is usually fed to the control processor directly with-
out going through the main compute pipeline, which can be used to implement some urgent
functions as swerving or applying the brakes.
One key problem we have observed with commercial CPUs, GPUs, or mobile SoCs is
the lack of built-in multi-sensor processing supports, hence most of the multi-sensor processing
has to be done in software, which could lead to problems such as time synchronization. On
the other hand, FPGAs provide a rich sensor interface and enable most time-critical sensor
4 1. INTRODUCTION AND OVERVIEW

Sensing Perception Decision


GPS/IMU Mapping Path Planning

LiDAR Localization Action Prediction

Camera Object Detection Obstacle Avoidance

Radar/Sonar Object Tracking Feedback Control

Operating System

Hardware Platform

Figure 1.1: The stack of the robotic system.

processing tasks to be done in hardware [21]. In Chapter 2, we introduce FPGA technologies,


especially how FPGAs provide rich I/O blocks, which can be configured for heterogeneous
sensor processing.

1.2 PERCEPTION
The sensor data is then fed into the perception layer to sense the static and dynamic objects as
well as build a reliable and detailed representation of the robot’s environment by using computer
vision techniques (including deep learning).
The perception layer is responsible for object detection, segmentation, and tracking. There
are obstacles, lane dividers, and other objects to detect. Traditionally, a detection pipeline starts
with image pre-processing, followed by a region of interest detector, and finally a classifier that
outputs detected objects. In 2005, Dalal and Triggs [22] proposed an algorithm based on the
histogram of orientation (HOG) and support vector machine (SVM) to model both the ap-
pearance and shape of the object under various condition. The goal of segmentation is to give
the robot a structured understanding of its environment. Semantic segmentation is usually for-
mulated as a graph labeling problem with vertices of the graph being pixels or super-pixels.
Inference algorithms on graphical models such as conditional random field (CRF) [23, 24] are
used. The goal of tracking is to estimate the trajectory of moving obstacles. Tracking can be
formulated as a sequential Bayesian filtering problem by recursively running the prediction step
and correction step. Tracking can also be formulated by tracking-by-detection handling with
1.3. LOCALIZATION 5
Markovian decision process (MDP) [25], where an object detector is applied to consecutive
frames and detected objects are linked across frames.
In recent years, deep neural networks (DNNs), also known as deep learning, have greatly
affected the field of computer vision and made significant progress in solving robot percep-
tion problems. Most state-of-the-art algorithms now apply one type of neural network based
on convolution operation. Fast R-CNN [26], Faster R-CNN [27], SSD [28], YOLO [29],
and YOLO9000 [30] were used to get much better speed and accuracy in object detection.
Most CNN-based semantic segmentation work is based on Fully Convolutional Networks
(FCNs) [31], and there are some recent work in spatial pyramid pooling network [32] and pyra-
mid scene parsing network (PSPNet) [33] to combine global image-level information with the
locally extracted feature. By using auxiliary natural images, a stacked autoencoder model can be
trained offline to learn generic image features and then applied for online object tracking [34].
In Chapter 3, we review the state-of-the-art neural network accelerator designs and
demonstrate that with software-hardware co-design, FPGAs can achieve more than 10 times
better speed and energy efficiency than the state-of-the-art GPUs. This verifies that FPGAs are
a promising candidate for neural network acceleration. In Chapter 4, we review various stereo vi-
sion algorithms in the robotic perception and their FPGA accelerator designs. We demonstrate
that with careful algorithm-hardware co-design, FPGAs can achieve two orders of magnitude
of higher energy efficiency and performance than the state-of-the-art GPUs and CPUs.

1.3 LOCALIZATION
The localization layer is responsible for aggregating data from various sensors to locate the robot
in the environment model.
GNSS/IMU system is used for localization. The GNSS consist of several satellite systems,
such as GPS, Galileo, and BeiDou, which can provide accurate localization results but with a
slow update rate. In comparison, IMU can provide a fast update with less accurate rotation and
acceleration results. A mathematical filter, such as Kalman Filter, can be used to combine the
advantages of the two and minimize the localization error and latency. However, this sole system
has some problems, such as the signal may bounce off obstacles, introduce more noise, and fail
to work in closed environments.
LiDAR and High-Definition (HD) maps are used for localization. LiDAR can generate
point clouds and provide a shape description of the environment, while it is hard to differentiate
individual points. HD map has a higher resolution compared to digital maps and makes the route
familiar to the robot, where the key is to fuse different sensor information to minimize the errors
in each grid cell. Once the HD map is built, a particle filter method can be applied to localize
the robot in real-time correlated with LiDAR measurement. However, the LiDAR performance
may be severely affected by weather conditions (e.g., rain, snow) and bring localization error.
Cameras are used for localization as well. The pipeline of vision-based localization is sim-
plified as follows: (1) by triangulating stereo image pairs, a disparity map is obtained and used
6 1. INTRODUCTION AND OVERVIEW
to derive depth information for each point; (2) by matching salient features between successive
stereo image frames in order to establish correlations between feature points in different frames,
the motion between the past two frames is estimated; and (3) by comparing the salient features
against those in the known map, the current position of the robot is derived [35].
Apart from these techniques, sensor fusion strategy is also often utilized to combine mul-
tiple sensors for localization, which can improve the reliability and robustness of robot [36, 37].
In Chapter 5, we introduce a general-purpose localization framework that integrates key
primitives in existing algorithms along with its implementation in FPGA. The FPGA-based
localization framework retains high accuracy of individual algorithms, simplifies the software
stack, and provides a desirable acceleration target.

1.4 PLANNING AND CONTROL


The planning and control layer is responsible for generating trajectory plans and passing the
control commands based on the original and destination of the robot. Broadly, prediction and
routing modules are also included here, where their outputs are fed into downstream planning
and control layers as input. The prediction module is responsible for predicting the future be-
havior of surrounding objects identified by the perception layer. The routing module can be a
lane-level routing based on lane segmentation of the HD maps for autonomous vehicles.
Planning and control layers usually include behavioral decision, motion planning, and
feedback control. The mission of the behavioral decision module is to make effective and safe
decisions by leveraging all various input data sources. Bayesian models are becoming more and
more popular and have been applied in recent works [38, 39]. Among the Bayesian mod-
els, the Markov Decision Process (MDP) and Partially Observable Markov Decision Process
(POMDP) are the widely applied methods in modeling robot behavior. The task of motion plan-
ning is to generate a trajectory and send it to the feedback control for execution. The planned
trajectory is usually specified and represented as a sequence of planned trajectory points, and
each of these points contains attributes like location, time, speed, etc. Low-dimensional mo-
tion planning problems can be solved with grid-based algorithms (such as Dijkstra [40] or
A* [41]) or geometric algorithms. High-dimensional motion planning problems can be dealt
with sampling-based algorithms, such as Rapidly exploring Random Tree (RRT) [42] and Prob-
abilistic Roadmap (PRM) [43], which can avoid the problem of local minima. Reward-based
algorithms, such as the Markov decision process (MDP), can also generate the optimal path by
maximizing cumulative future rewards. The goal of feedback control is to track the difference
between the actual pose and the pose on the predefined trajectory by continuous feedback. The
most typical and widely used algorithm in robot feedback control is PID.
While optimization-based approaches enjoy mainstream appeal in solving motion plan-
ning and control problems, learning-based approaches [44–48] are becoming increasingly pop-
ular with recent developments in artificial intelligence. Learning-based methods, such as rein-
forcement learning, can naturally make full use of historical data and iteratively interact with the
1.5. FPGAS IN ROBOTIC APPLICATIONS 7
environment through actions to deal with complex scenarios. Some model the behavioral level
decisions via reinforcement learning [46, 48], while other approaches directly work on motion
planning trajectory output or even direct feedback control signals [45]. Q-learning [49], Actor-
Critic learning [50], and policy gradient [43] are some popular algorithms in reinforcement
learning.
In Chapter 6, we introduce the motion planning modules of the robotics system, and
compare several FPGA and ASIC accelerator designs in motion planning to analyze intrinsic
design trade-offs. We demonstrate that with careful algorithm-hardware co-design, FPGAs can
achieve three orders of magnitude than CPUs and two orders of magnitude than GPUs with
much lower power consumption. This demonstrates that FPGAs can be a promising candidate
for accelerating motion planning kernels.

1.5 FPGAS IN ROBOTIC APPLICATIONS


Besides accelerating the basic modules in the robotic computing stack, FPGAs have been uti-
lized in different robotic applications. In Chapter 7, we explore how FPGAs can be utilized
in multi-robot exploration tasks. Specifically, we present an FPGA-based interruptible CNN
accelerator and a deployment framework for multi-robot exploration.
In Chapter 8, we provide a retrospective summary of PerceptIn’s efforts on developing
on-vehicle computing systems for autonomous vehicles, especially how FPGAs are utilized to
accelerate critical tasks in a full autonomous driving stack. For instance, localization is acceler-
ated on an FPGA while depth estimation and object detection are accelerated by a GPU. This
case study has demonstrated that FPGAs are capable of playing a crucial role in autonomous
driving, and exploiting accelerator-level parallelism while taking into account constraints arising
in different contexts could significantly improve on-vehicle processing.
In Chapter 9, we explore how FPGAs have been utilized in space robotic applications in
the past two decades. The properties of FPGAs make them good onboard processors for space
missions, ones that have high reliability, adaptability, processing power, and energy efficiency.
FPGAs may help us close the two-decade performance gap between commercial processors and
space-grade ASICs when it comes to powering space exploration robots.

1.6 THE DEEP PROCESSING PIPELINE


Different from other computing workloads, autonomous machines have a very deep processing
pipeline with strong dependencies between different stages and a strict time-bound associated
with each stage [51]. For instance, Fig. 1.2 presents an overview of the processing pipeline of an
autonomous driving system. Starting from the left side, the system consumes raw sensing data
from mmWave radars, LiDARs, cameras, and GNSS/IMUs, and each sensor produces raw data
at a different frequency. The cameras capture images at 30 FPS and feed the raw data to the 2D
Perception module, the LiDARs capture point clouds at 10 FPS and feed the raw data to the
8 1. INTRODUCTION AND OVERVIEW
mmWave
Radar
10 Hz

Camera
30 Hz 2D 30 Hz 10 Hz 10 Hz 10 Hz 10 Hz
Perception
Perception
LiDAR Tracking Prediction Planning Control
Fusion
10 Hz 3D
Perception 10 Hz 100 Hz

10 Hz

100 Hz 10 Hz
Localization

GNSS/IMU Vehicle Chassis

Figure 1.2: The processing pipeline of autonomous vehicles.

3D Perception module as well as the Localization module, the GNSS/IMUs generate positional
updates at 100 Hz and feed the raw data to the Localization module, the mmWave radars detect
obstacles at 10 FPS and feed the raw data to the Perception Fusion module.
Next, the results of 2D and 3D Perception Modules are fed into the Perception Fusion
module at 30 Hz and 10 Hz, respectively, to create a comprehensive perception list of all detected
objects. The perception list is then fed into the Tracking module at 10 Hz to create a tracking list
of all detected objects. The tracking list is then fed into the Prediction module at 10 Hz to create
a prediction list of all objects. After that, both the prediction results and the localization results
are fed into the Planning module at 10 Hz to generate a navigation plan. The navigation plan is
then fed into the Control module at 10 Hz to generate control commands, which are finally sent
to the autonomous vehicle for execution at 100 Hz.
Hence, for each 10 ms, the autonomous vehicle needs to generate a control command to
maneuver the vehicle. If any upstream module, such as the Perception module, misses the deadline
to generate an output, the Control module still has to generate a command before the deadline.
This could lead to disastrous results as the autonomous vehicle is essentially driving blindly
without the perception output.
The key challenge is to design a system to minimize the end-to-end latency of the deep
processing pipeline within energy and cost constraints, and with minimum latency variation.
In this book, we demonstrate that FPGAs can be utilized in different modules in this long
processing pipeline to minimize latency, reduce latency variation, and achieve energy efficiency.
1.7. SUMMARY 9
1.7 SUMMARY
The authors believe that FPGAs are the indispensable compute substrate for robotic applications
for several reasons.
• First, robotic algorithms are still evolving rapidly. Thus, any ASIC-based accelerators
will be months or even years behind the state-of-the-art algorithms; on the other hand,
FPGAs can be dynamically updated as needed.
• Second, robotic workloads are highly diverse. Thus, it is difficult for any ASIC-based
robotic computing accelerator to reach economies of scale in the near future; on the
other hand, FPGAs are a cost-effective and energy-effective alternative before one type
of accelerator reaches economies of scale.
• Third, compared to SoCs that have reached economies of scale, e.g., mobile SoCs and
FPGAs deliver a significant performance advantage.
• Fourth, partial reconfiguration allows multiple robotic workloads to time-share an
FPGA, thus allowing one chip to serve multiple applications, leading to overall cost
and energy reduction.
Specifically, FPGAs require little power and are often built into small systems with less
memory. They have the ability to parallel computations massively and make use of the proper-
ties of perception (e.g., stereo matching), localization (e.g., SLAM), and planning (e.g., graph
search) kernels to remove additional logic and simplify the implementation. Taking into ac-
count hardware characteristics, several algorithms are proposed which can be run in a hardware-
friendly way and achieve similar software performance. Therefore, FPGAs are possible to meet
real-time requirements while achieving high energy efficiency compared to CPUs and GPUs.
Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re-
programming without going through re-fabrication with a modified design. PR takes this flex-
ibility one step further, allowing the modification of an operating FPGA design by loading a
partial configuration file. Using PR, part of the FPGA can be reconfigured at runtime without
compromising the integrity of the applications running on those parts of the device that are
not being reconfigured. As a result, PR can allow different robotic applications to time-share
part of an FPGA, leading to energy and performance efficiency, and making FPGA a suitable
computing platform for dynamic and complex robotic workloads.
Due to the advantages over other compute substrates, FPGAs have been successfully uti-
lized in commercial autonomous vehicles. Particularly, over the past four years, PerceptIn has
built and commercialized autonomous vehicles for micromobility, and PerceptIn’s products have
been deployed in China, the U.S., Japan, and Switzerland. In this book, we provide a real-world
case study on how PerceptIn developed its computing system by relying heavily on FPGAs,
which perform not only heterogeneous sensor synchronizations but also the acceleration of soft-
ware components on the critical path. In addition, FPGAs are used heavily in space robotic
10 1. INTRODUCTION AND OVERVIEW
applications, for FPGAs offered unprecedented flexibility and significantly reduced the design
cycle and development cost.
11

CHAPTER 2

FPGA Technologies
Before we delve into utilizing FPGAs for accelerating robotic workloads, in this chapter we
first provide the background of FPGA technologies so that readers without prior knowledge
can grasp the basic understanding of what an FPGA is and how an FPGA works. We also
introduce partial reconfiguration, a technique that exploits the flexibility of FPGAs and one
that is extremely useful for various robotic workloads to time-share an FPGA so as to minimize
energy consumption and resource utilization. In addition, we explore existing techniques that
enable the robot operating system (ROS), an essential infrastructure for robotic computing, to
run directly on FPGAs.

2.1 AN INTRODUCTION TO FPGA TECHNOLOGIES


In the 1980s, FPGAs emerged as a result of increasing integration in electronics. Before the use
of FPGAs, glue-logic designs were based on individual boards with fixed components intercon-
nected via a shared standard bus, which has various drawbacks, such as hindrance of high volume
data processing and higher susceptibility to radiation-induced errors, in addition to inflexibility.
In detail, FPGAs are semiconductor devices that are based around a matrix of config-
urable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be repro-
grammed to desired application or functionality requirements after manufacturing. This feature
distinguishes FPGAs from Application-Specific Integrated Circuits (ASICs), which are custom
manufactured for specific design tasks.
Note that ASICs and FPGAs have different value propositions, and they must be care-
fully evaluated before choosing any one over the other. While FPGAs used to be selected for
lower-speed/complexity/volume designs in the past, today’s FPGAs easily push the 500 MHz
performance barrier. With unprecedented logic density increases and a host of other features,
such as embedded processors, DSP blocks, clocking, and high-speed serial at ever lower price
points, FPGAs are a compelling proposition for almost any type of design.
Modern FPGAs are with massive reconfigurable logic and memory, which let engineers
build dedicated hardware with superb power and performance efficiency. Especially, FPGAs are
attracting attention from the robotic community and becoming an energy-efficient platform for
robotic computing. Unlike ASIC counterparts, FPGA technology provides the flexibility of on-
site programming and re-programming without going through re-fabrication with a modified
design, due to its underlying reconfigurable fabrics.
12 2. FPGA TECHNOLOGIES
2.1.1 TYPES OF FPGAS
FPGAs can be categorized by the type of their programmable interconnection switches: antifuse,
SRAM, and Flash. Each of the three technologies comes with trade-offs.

• Antifuse FPGAs are non-volatile and have a minimal delay due to routing, resulting
in a faster speed and lower power consumption. The drawback is evident as they have a
relatively more complicated fabrication process and are only one-time programmable.

• SRAM-based FPGAs are field reprogrammable and use the standard fabrication pro-
cess that foundries put in significant effort in optimizing, resulting in a faster rate of
performance increase. However, based on SRAM, these FPGAs are volatile and may
not hold configuration if a power glitch occurs. Also, they have more substantial rout-
ing delays, require more power, and have a higher susceptibility to bit errors. Note that
SRAM-based FPGAs are the most popular compute substrates in space applications.

• Flash-based FPGAs are non-volatile and reprogrammable, and also have low power
consumption and route delay. The major drawback is that runtime reconfiguration is
not recommended for flash-based FPGAs due to the potentially destructive results if
radiation effects occur during the reconfiguration process [52]. Also, the stability of
stored charge on the floating gate is of concern: it is a function including factors such
as operating temperature, the electric fields that might disturb the charge. As a result,
flash-based FPGAs are not as frequently used in space missions [53].

2.1.2 FPGA ARCHITECTURE


In this subsection, we introduce the basic components in FPGA architecture in the hope of
providing basic background knowledge to readers with limited prior knowledge on FPGA tech-
nologies. For a detailed and thorough explanation, interested authors can refer to [54].
As shown in Fig. 2.1, a basic FPGA design usually contains the following components.

• Configurable Logic Blocks (CLBs) are the basic repeating logic resources on an
FPGA. When linked together by the programmable routing blocks, CLBs can exe-
cute complex logic functions, implement memory functions, and synchronize code on
the FPGA. CLBs contain smaller components, including flip-flops (FFs), look-up ta-
bles (LUTs), and multiplexers (MUX). An FF is the smallest storage resource on the
FPGA. Each FF in a CLB is a binary register used to save logic states between clock
cycles on an FPGA circuit. An LUT stores a predefined list of outputs for every com-
bination of inputs. LUTs provide a fast way to retrieve the output of a logic operation
because possible results are stored and then referenced rather than calculated. A MUX
is a circuit that selects between two or more inputs and then returns the selected input.
Any logic can be implemented using the combination of FFs, LUTs, and MUX.
2.1. AN INTRODUCTION TO FPGA TECHNOLOGIES 13
Configurable logic block
Interconnect wires
Set by
Logic block SRAM configuration
bit-strem
1
Output
Inputs 4-LUT FF 0

4-input look-up table


DSP
Programmable
routing block

I/O block
Reg Input
DDR MUX
OCK1
Reg
DSP ICK1
Reg
Reg
OCK2 3-state
ICK2

Reg
DDR MUX
OCK1

PAD
Reg
OCK2 Output

Figure 2.1: Overview of FPGA architecture.

• Programmable Routing Blocks (PRBs) provide programmability for connectivity


among a pool of CLBs. The interconnection network contains configurable switch ma-
trices and connection blocks that can be programmed to form the demanded connec-
tion. PRBs can be divided into Connection Blocks (CBs) and a matrix of Switch Boxes
(SBs), namely, Switch Matrix (SM). CBs are responsible to provide a connection be-
tween CLBs input/output pins to the adjacent routing channels. SBs are placed at the
intersection points of vertical and horizontal routing channels. Routing a net from a
CLB source to a CLB target necessitates passing through multiple interconnect wires
and SBs, in which an entering signal from a certain side can connect to any of the other
three directions based on the SM topology.

• I/O Blocks (IOBs) are used to bridge signals onto the chip and send them back off
again. An IOB consists of an input buffer and an output buffer with three-state and
open-collector output controls. Typically, there are pull-up resistors on the outputs and
sometimes pull-down resistors that can be used to terminate signals and buses without
requiring discrete resistors external to the chip. The polarity of the output can usually
be programmed for active high or active low output. There are typical flip-flops on
outputs so that clocked signals can be output directly to the pins without encountering
significant delay, more easily meeting the setup time requirement for external devices.
14 2. FPGA TECHNOLOGIES
Since there are many IOBs available on an FPGA and these IOBs are programmable,
we can easily design a compute system to connect to different types of sensors, which
are extremely useful in robotic workloads.
• Digital Signal Processors (DSPs) have been optimized to implement various com-
mon digital signal processing functions with maximum performance and minimum
logic resource utilization. In addition to multipliers, each DSP block has functions
that are frequently required in typical DSP algorithms. These functions usually in-
clude pre-adders, adders, subtractors, accumulators, coefficient register storage, and a
summation unit. With these rich features, the DSP blocks in the Stratix series FP-
GAs are ideal for applications with high-performance and computationally intensive
signal processing functions, such as finite impulse response (FIR) filtering, fast Fourier
transforms (FFTs), digital up/down conversion, high-definition (HD) video process-
ing, HD CODECs, etc. Besides the aforementioned traditional workloads, DSPs are
also extremely useful for robotic workloads, especially computer vision workloads, pro-
viding high-performance and low-power solutions for robotic vision front ends [55].

2.1.3 COMMERCIAL APPLICATIONS OF FPGAS


Due to their programmable nature, FPGAs are an ideal fit for many different markets such as
the following.
• Aerospace & Defense – Radiation-tolerant FPGAs along with the intellectual
property for image processing, waveform generation, and partial reconfiguration for
Software-Defined Radios, especially for space and defense applications.
• ASIC Prototyping – ASIC prototyping with FPGAs enables fast and accurate SoC
system modeling and verification of embedded software.
• Automotive – FPGAs enable automotive silicon and IP solutions for gateway and
driver assistance systems, as well as comfort, convenience, and in-vehicle infotainment.
• Consumer Electronics – FPGAs provide cost-effective solutions enabling next-
generation, full-featured consumer applications, such as converged handsets, digital
flat panel displays, information appliances, home networking, and residential set top
boxes.
• Data Center – FPGAs have been utilized heavily for high-bandwidth, low-latency
servers, networking, and storage applications to bring higher value into cloud deploy-
ments.
• High-Performance Computing and Data Storage – FPGAs have been utilized
widely for Network Attached Storage (NAS), Storage Area Network (SAN), servers,
and storage appliances.
2.2. PARTIAL RECONFIGURATION 15
• Industrial – FPGAs have been utilized in targeted design platforms for Industrial, Sci-
entific, and Medical (ISM) enable higher degrees of flexibility, faster time-to-market,
and lower overall non-recurring engineering costs (NRE) for a wide range of applica-
tions such as industrial imaging and surveillance, industrial automation, and medical
imaging equipment.

• Medical – For diagnostic, monitoring, and therapy applications, FPGAs have been
used to meet a range of processing, display, and I/O interface requirements.

• Security – FPGAs offer solutions that meet the evolving needs of security applications,
from access control to surveillance and safety systems.

• Video & Image Processing – FPGAs have been utilized in targeted design platforms
to enable higher degrees of flexibility, faster time-to-market, and lower overall non-
recurring engineering costs (NRE) for a wide range of video and imaging applications.

• Wired Communications – FPGAs have been utilized to develop end-to-end solutions


for the Reprogrammable Networking Linecard Packet Processing, Framer/MAC, se-
rial backplanes, and more.

• Wireless Communications – FPGAs have been utilized to develop RF, base band,
connectivity, transport, and networking solutions for wireless equipment, addressing
standards such as WCDMA, HSDPA, WiMAX, and others.

In the rest of this book, we explore robotic computing, an emerging and potentially a killer
application for FPGAs. With FPGAs, we can develop low-power, high-performance, cost-
effective, and flexible compute systems for various robotic workloads. Due to the advantages
provided by FPGAs, we expect that robotic applications will be a major demand driver for
FPGAs in the near future.

2.2 PARTIAL RECONFIGURATION


Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re-
programming without going through re-fabrication with a modified design. PR takes this flex-
ibility one step further, allowing the modification of an operating FPGA design by loading a
PR file. Using PR, part of the FPGA can be reconfigured at runtime without compromising the
integrity of the applications running on those parts of the device that are not being reconfigured.
As a result, PR can allow different robotic applications to time-share part of an FPGA, leading
to energy and performance efficiency, and making FPGAs suitable computing platforms for
dynamic and complex robotic workloads.
16 2. FPGA TECHNOLOGIES
2.2.1 WHAT IS PARTIAL RECONFIGURATION?
The obvious benefit of using reconfigurable devices, such as FPGAs, is that the functionality that
a device has now can be changed and updated at some time in the future. As additional func-
tionality is available or design improvements are made available, the FPGA can be completely
reprogrammed with new logic. PR takes this capability one step further by allowing designers
to change the logic within a part of an FPGA without disrupting the entire system. This allows
designers to divide their system into modules, each comprised of one block of logic and, without
disrupting the whole system and stopping the flow of data, the users can update the functionality
within one block.
Runtime partial reconfiguration (RPR) is a special feature offered by many FPGAs that
allows designers to reconfigure certain portions of the FPGA during runtime without influ-
encing other parts of the design. This feature allows the hardware to be adaptive to a changing
environment. First, it allows optimized hardware implementation to accelerate computation.
Second, it allows efficient use of chip area such that different hardware modules can be swapped
in/out of the chip at runtime. Last, it may allow leakage and clock distribution power saving by
unloading hardware modules that are not active.
RPR is extremely useful for robotic applications, as a mobile robot might encounter very
different environments as it navigates, and it might require different perception, localization,
or planning algorithms for these different environments. For instance, while a mobile robot is
in an indoor environment, it is likely to use an indoor map for localization, but when it travels
outdoor, it might choose to use GPS and visual-inertial odometry for localization. Keeping
multiple hardware accelerators for different tasks is not only costly but also energy inefficient.
RPR provides a perfect solution for this problem. As shown in Fig. 2.2, an FPGA is divided into
three partitions for the three basic functions, one for perception, one for localization, and one for
planning. Then for each function, there are three algorithms ready, one for each environment.
Each of these algorithms is converted to a bit file and ready for RPR when needed. For instance,
when a robot navigates to a new environment and decides that a new perception algorithm is
needed, it can load the target bit file and sends it to the internal configuration access port (ICAP)
to reconfigure the perception partition.
One major challenge of RPR for robotic computing is the configuration speed, as most
robotic tasks have strong real-time constraints, and to maintain the performance of the robot,
the reconfiguration process has to finish within a very tight time bound. In addition, the recon-
figuration process incurs performance and power overheads. By maximizing the configuration
speed, these overheads can be minimized as well.

2.2.2 HOW TO USE PARTIAL RECONFIGURATION?


PR allows the modification of an operating FPGA design by loading a PR file, or a bit file
through ICAP [56]. Using PR, after a full bit file configures the FPGA, partial bit files can
also be downloaded to modify reconfigurable regions in the FPGA without compromising the
2.2. PARTIAL RECONFIGURATION 17

Perception Perception Perception


module 1 module 2 module 3
Partition 1
I Localization Localization Localization
C module 1 module 2 module 3
Partition 2
A
P Planning Planning Planning
Partition 3 module 1 module 2 module 3

FPGA

Figure 2.2: An example of partial reconfiguration for robotic applications.

integrity of the applications running on those parts of the device that are not being reconfigured.
RPR allows a limited, predefined portion of an FPGA to be reconfigured while the rest of
the device continues to operate, and this feature is especially valuable where devices operate
in a mission-critical environment that cannot be disrupted while some subsystems are being
redefined.
In an SRAM-based FPGA, all user-programmable features are controlled by memory
cells that are volatile and must be configured on power-up. These memory cells are known as
the configuration memory, and they define the look-up table (LUT) equations, signal routing,
input/output block (IOB) voltage standards, and all other aspects of the design. In order to
program the configuration memory, instructions for the configuration control logic and data for
the configuration memory are provided in the form of a bitstream, which is delivered to the
device through the JTAG, SelectMAP, serial, or ICAP configuration interface. An FPGA can
be partially reconfigured using a partial bitstream. A designer can use such a partial bitstream
to change the structure of one part of an FPGA design as the rest of the device continues to
operate.
RPR is useful for systems with multiple functions that can time-share the same FPGA
device resources. In such systems, one section of the FPGA continues to operate, while other
sections of the FPGA are disabled and reconfigured to provide new functionality. This is anal-
ogous to the situation where a microprocessor manages context switching between software
processes. In the case of PR of an FPGA, however, it is the hardware instead of the software
that is being switched.
RPR provides an advantage over multiple full bitstreams in applications that require con-
tinuous operation, which would not be possible during full reconfiguration. One example is a
mobile robot that switches the perception module while keeping the localization module and
planning module intact when moving from a dark environment to a bright environment. With
RPR, the system can maintain the localization and planning modules while the perception mod-
ule within the FPGA is changed on the fly.
18 2. FPGA TECHNOLOGIES

Source file
Behavioral
simulation
Synthesis
Functional
verification
Static
Layout
analysis
Static
analysis
Partial bit file Full bit file
generation generation

In-circuit Partial Upload to In-circuit


verification reconfiguration FPGA verification

Figure 2.3: FPGA regular and partial reconfiguration design flow.

Xilinx has provided the PR feature in their high-end FPGAs, the Virtex series, in limited
access BETA since the late 1990s. More recently it is a production feature supported by their
tools and across their devices since the release of ISE 12. The support for this feature continues
to improve in the more recent release of ISE 13. Altera has promised this feature for their new
high-end devices, but this has not yet materialized. PR of FPGAs is a compelling design concept
for general purpose reconfigurable systems for its flexibility and extensibility.
Using the Xilinx tool chain, designers can go through the regular synthesis flow to generate
a single bitstream for programming the FPGA. This considers the device as a single atomic
entity. As opposed to the general synthesis flow, the PR flow physically divides the FPGA device
into regions. One region is called the “static region,” which is the portion of the device that is
programmed at startup and never changes. Another region is the “PR region,” which is the
portion of the device that will be reconfigured dynamically, potentially multiple times and with
different designs. It is possible to have multiple PR regions, but we will consider only the simplest
case here. The PR flow generates at least two bitstreams, one for the static and one for the
PR region. Most likely, there will be multiple PR bitstreams, one for each design that can be
dynamically loaded.
As shown in Fig. 2.3, the first step in implementing a system using the PR design flow is
the same as the regular design, which is to synthesize the netlists from the HDL sources that
will be used in the implementation and layout process. Note that the process requires separate
netlists for the static (top-level) designs and the PR partitions. A netlist must be generated for
each implementation of the PR partition used in the design. If the system design has multiple
2.2. PARTIAL RECONFIGURATION 19
PR partitions, then it will require a netlist for each implementation of each PR partition, even
if the logic is the same in multiple locations. Then once a netlist is done, we need to work on the
layout for each design to make sure that the netlist fits into the dedicated partition, and we need
to make sure that there are enough resources available for the design in each partition. Once
the implementation is done, we can then generate the bit file for each partition. At runtime,
we can dynamically swap different designs to a partition for the robot to adapt to the changing
environment. For more details on how to use PR on FPGAs, please refer to [57].

2.2.3 ACHIEVING HIGH PERFORMANCE


A major performance bottleneck for PR is the configuration overhead, which determines the
usefulness of PR. If PR is done fast enough, we can use this feature to enable mobile robots to
swap hardware components at runtime. If PR cannot be done fast enough, we can only use this
feature to perform offline hardware updates.
To address this problem, in [58], the authors propose a combination of two techniques
to minimize the overhead. First, the authors design and implement fully streaming DMA en-
gines to saturate the configuration throughput. Second, the authors exploit a simple form of
data redundancy to compress the configuration bitstreams, and implement an intelligent in-
ternal configuration access port (ICAP) controller to perform decompression at runtime. This
design achieves an effective configuration data transfer throughput of up to 1.2 GB/s, which
well surpasses the theoretical upper bound of the data transfer throughput, 400 MB/s. Specifi-
cally, the proposed fully streaming DMA engines reduce the configuration time from the range
of seconds to the range of milliseconds, a more than 1000-fold improvement. In addition, the
proposed compression scheme achieves up to a 75% reduction in bitstream size and results in a
decompression circuit with negligible hardware overhead.
Figure 2.4 shows the architecture of the fast PR engine, which consists of:
• a direct memory access (DMA) engine to establish a direct transfer link between the
external SRAM, where the configuration files are stored, and the ICAP;

• a streaming engine implemented with a FIFO queue to buffer data between the con-
sumer and the producer to eliminate the handshake between the producer and the
consumer for each data transfer; and

• turn on the burst mode for ICAP thus it can fetch four words instead of one word at a
time.
We will explain this design in greater details in the following sections.

Problems with the Out-of-Box PR Engine Design


Without the fast PR engine, in the out-of-box design, the ICAP Controller contains only the
ICAP and the ICAP FSM, and the SRAM Controller only contains the SRAM Bridge and
20 2. FPGA TECHNOLOGIES

System bus

ICAP controller

SRAM controller
SRAM bridge
ICAP FSM

Secondary
Primary
DMA

DMA
FIFO

SRAM interface
ICAP

Figure 2.4: Fast partial reconfiguration engine.

the SRAM Interface. Hence, there is no direct memory access between SRAM and ICAP, and
all configuration data transfers are done in software. In this way, the pipeline issues one read
instruction to fetch a configuration word from SRAM, and then issues a write instruction to
send the word to ICAP; instructions are also fetched from SRAM, and this process repeats
until the transfer process completes. This scheme is highly inefficient because the transfer of one
word requires tens of cycles, and the ICAP transfer throughput of this design is only 318 KB/s,
whereas on the product specification, the ideal ICAP throughput is 400 MB/s. Hence the out-
of-box design throughput is 1000 times worse than the ideal design.

Configuration Time is a Pure Function of the Bitstream Size?


Theoretically, the ICAP throughput can reach 400 MB/s, but this is achievable only if the config-
uration time is a pure function of bitstream file size. In order to find out whether this theoretical
throughput is achievable, the authors of [58] performed experiments to configure different re-
gions of the FPGA chip, to repeatedly writing NOPs to ICAP, and to stress the configuration
circuit by repeatedly configuring one region. During all these tests, we found out that ICAP al-
ways ran at full speed such that it was able to consume four bytes of configuration data per cycle,
2.2. PARTIAL RECONFIGURATION 21
regardless of the semantics of the configuration data. This confirms that configuration time is a
pure function of the size of the bitstream file.

Adding the Primary-Secondary DMA Engines


To improve PR throughput, we first can simply implement a pair of primary-secondary DMA
engines. The primary DMA engine resides in the ICAP controller and interfaces with the ICAP
FSM, the ICAP, as well as the secondary DMA engine. The secondary DMA engine resides in
the SRAM Controller, and it interfaces with the SRAM Bridge and the primary DMA engine.
When a DMA operation starts, the primary DMA engine receives the starting address as well
as the size of the DMA operation. Then it starts sending control signals (read-enable, address,
etc.) to the secondary DMA engine, which then forwards the signals to the SRAM Bridge.
After the data is fetched, the secondary DMA engine sends the data back to the primary DMA
engine. Then, the primary DMA engine decrements the size counter, increments the address,
and repeats the process to fetch the next word. Compared to the out-of-box design, simply
adding the DMA engines avoids the involvement of the pipeline in the data transfer process
and it significantly increases the PR throughput to 50 MB/s, a 160-fold improvement.

Adding a FIFO between the DMA Engines


To further improve the PR throughput, we can modify the primary-secondary DMA engines
by adding a FIFO between the two DMA engines. In this version of the design, when DMA
operation starts, instead of sending control signals to the secondary DMA engine, the primary
DMA engine forwards the starting address and the size of the DMA operation to the secondary
DMA engine, then it waits for the data to become available in the FIFO. Once data becomes
available in the FIFO, the primary DMA engine reads the data and decrements its size counter.
When the counter hits zero, the DMA operation completes. On the other side, upon receiving
the starting address and size of the DMA operation, the secondary DMA engine starts sending
control signals to the SRAM Bridge to fetch data one word at a time. Then once the secondary
DMA engine receives data from the SRAM Bridge, it writes the word into the FIFO, decre-
ments its size counter, and increments its address register to fetch the next word. In this design,
only data is transferred between the primary and secondary DMA engines, and all control op-
erations to SRAM are handled in the secondary DMA. This greatly simplifies the handshaking
between the ICAP Controller and the SRAM Controller, and it leads to a 100 MB/s ICAP
throughput, an additional two-fold improvement.

Adding Burst Mode to Provide Fully Streaming


The SRAM on most FPGA boards usually provides burst read mode such that we can read
four words at a time instead of one. Burst mode reads are available on most DDR memories
as well. There is an ADVLD signal to the SRAM device. During a read, if this signal is set,
then a new address is loaded into the device. Otherwise, the device will output a burst of up to
22 2. FPGA TECHNOLOGIES
four words, one word per cycle. Therefore, if we can set the ADVLD signal every four cycles,
each time we increment the address by four words, and given that the synchronization between
control signals and data fetches is correct, then we are able to stream data from SRAM to the
ICAP. We implement two independent state machines in the secondary DMA engine. One state
machine sends control signals as well as the addresses to the SRAM in a continuous manner,
such that in every four cycles, the address is incremented by four words (16 bytes) and sent to
the SRAM device. The other state machine simply waits for the data to become ready at the
beginning, and then in each cycle, it receives one word from the SRAM and streams the word
to the FIFO until the DMA operation completes. Similarly, the primary DMA engine waits for
data to become available in the FIFO, and then in each cycle, it reads one word from the FIFO
and streams the word to the ICAP until the DMA operation completes. This fully streaming
DMA design leads to an ICAP throughput that exceeds 395 MB/s, which is very close to the
ideal 400 MB/s throughputs.

Energy Efficiency
In [59], the authors indicate that the polarity of the FPGA hardware structures may significantly
impact leakage power consumption. Based on this observation, the authors of [60] tried to find
out whether FPGAs utilize this property such that when the blank bitstream is loaded to wipe
out an accelerator, the circuit is set to a state to minimize the leakage power consumption. In
order to achieve this, the authors implemented eight PR regions on an FPGA chip, with each
region occupying a configuration frame. These eight PR regions did not consume any dynamic
power, as the authors purposely gated off the clock to these regions. Then the authors used the
blank bitstream files to wipe out each of these regions and observed the chip power consumption
behavior. The results indicated that for every four configuration frames that we applied the blank
bitstream on, the chip power consumption dropped by a constant amount. This study confirms
that PR indeed leads to static power reduction and suggests that FPGAs may have utilized the
polarity property to minimize leakage power.
In addition, the authors of [60] studied whether PR can be used as an effective energy re-
duction technique in reconfigurable computing systems. To approach this problem, the authors
first identified the analytical models that capture the necessary conditions for energy reduc-
tion under different system configurations. The models show that increasing the configuration
throughput is a general and effective way to minimize the PR energy overhead. Therefore, the
authors designed and implemented a fully streaming DMA engine that nearly saturates the
configuration throughput.
The findings provide answers to the three questions: first, although we pay extra power to
use an accelerator, depending on the accelerator’s ability to accelerate the program execution, it
will result in actual energy reduction. The experimental results in [60] demonstrate that due to
its low power overhead and excellent ability of acceleration, having an acceleration extension can
lead to both program speedup and system energy reduction. Second, it is worthwhile to use PR
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 23
to reduce chip energy consumption if the energy reduction can make up for the energy overhead
incurred during the reconfiguration process; and the key to minimize the energy overhead during
the reconfiguration process is to maximize the configuration speed. The experimental results
in [60] confirm that enabling PR is a highly effective energy reduction technique. Finally, clock
gating is an effective technique in reducing energy consumption due to its negligible overhead;
however, it reduces only dynamic power whereas PR reduces both dynamic and static power.
Therefore, PR can lead to a larger energy reduction than clock gating, provided the extra energy
saving on static power elimination can make up for the energy overhead incurred during the
reconfiguration process.
Although the conventional wisdom is that PR is only useful if the accelerator would not
be used for a very long period of time, the experimental results in [60] indicate that with the
high configuration throughput delivered by the fast PR engine, PR can outperform clock gating
in energy reduction even if the accelerator inactive time is in the millisecond range. In summary,
based on the results from [58] and [60], we can conclude that PR is an effective technique for
improving both performance and energy efficiency, and it is the key feature that makes FPGAs
a highly attractive choice for dynamic robotic computing workloads.

2.2.4 REAL-WORLD CASE STUDY


Following the design presented in [60], PerceptIn has demonstrated in their commercial prod-
uct that RPR is useful for robotic computing, especially computing for autonomous vehicles,
because many on-vehicle tasks usually have multiple versions where each is used in a particular
scenario [20]. For instance, in PerceptIn’s design, the localization algorithm relies on salient
features; features in keyframes are extracted by a feature extraction algorithm (based on ORB
features [61]), whereas features in non-key frames are tracked from previous frames (using op-
tical flow [62]); the latter executes in 10 ms, 50% faster than the former. Spatially sharing the
FPGA is not only area-inefficient but also power-inefficient as the unused portion of the FPGA
consumes non-trivial static power. In order to temporally share the FPGA and “hot-swap” dif-
ferent algorithms, PerceptIn developed a Partial Reconfiguration Engine (PRE) that dynami-
cally reconfigures part of the FPGA at runtime. The PRE achieves a 400 MB/s reconfiguration
throughput (i.e., bitstream programming rate). Both the feature extraction and tracking bit-
streams are less than 4 MB. Thus, the reconfiguration delay is less than 1 ms.

2.3 ROBOT OPERATING SYSTEM (ROS) ON FPGAS


As demonstrated in the previous chapter, autonomous vehicles and robots demand complex in-
formation processing such as SLAM (Simultaneous Localization and Mapping), deep learning,
and many other tasks. FPGAs are promising in accelerating these applications with high energy
efficiency. However, utilizing FPGAs for robotic workloads is challenging due to the high de-
velopment costs and lack of talents who can understand both FPGAs and robotics. One way to
address this challenge is to directly support ROS on FPGAs as ROS already provides the basic
24 2. FPGA TECHNOLOGIES
infrastructure for supporting efficient robotic computing. Hence, in this section we explore the
state-of-the-art supports for ROS to run on FPGAs.

2.3.1 ROBOT OPERATING SYSTEM (ROS)


Before delving into supports for running ROS on FPGAs, we first understand the importance
of ROS in robotic applications. ROS is an open-source, meta-operating system for autonomous
machines and robots. It provides the essential operating system services, including hardware
abstraction, low-level device control, implementation of commonly used functionality, message-
passing between processes, and package management. ROS also provides tools and libraries for
obtaining, building, writing, and running code across multiple computers. The primary goal
of ROS is to support code reuse in robotics research and development. In essence, ROS is a
distributed framework of processes that enables executables to be individually designed and
loosely coupled at runtime. These processes can be grouped into Packages and Stacks, which
can be easily shared and distributed. ROS also supports a federated system of code Repositories
that enable collaboration to be distributed as well. This design, from the file system level to the
community level, enables independent decisions about development and implementation, but
all can be brought together with ROS infrastructure tools [63].
The core objectives of the ROS framework include the following.

• Thin: ROS is designed to be as thin as possible so that code written for ROS can be
used with other robot software frameworks.

• ROS-agnostic libraries: the preferred development model is to write ROS-agnostic


libraries with clean functional interfaces.

• Language independence: the ROS framework is easy to implement in any modern


programming language. The ROS development team has already implemented it in
Python, C++, and Lisp, and we have experimental libraries in Java and Lua.

• Easy testing: ROS has a built-in unit/integration test framework called rostest that
makes it easy to bring up and tear down test fixtures.

• Scaling: ROS is appropriate for large runtime systems and large development processes.

The Computation Graph is the peer-to-peer network of ROS processes that are processing
data together. The basic Computation Graph concepts of ROS are nodes, Master, Parameter
Server, messages, services, topics, and bags, all of which provide data to the Graph in different
ways.

• Nodes: nodes are processes that perform computation. ROS is designed to be modular
at a fine-grained scale; a robot control system usually comprises many nodes. Take
autonomous vehicles as an example, one node controls a laser range-finder, one node
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 25
controls the wheel motors, one node performs localization, one node performs path
planning, one node provides a graphical view of the system, and so on. A ROS node
is written with the use of a ROS client library, such as roscpp or rospy.
• Master: the ROS Master provides name registration and lookup to the rest of the
Computation Graph. Without the Master, nodes would not be able to find each other,
exchange messages, or invoke services.
• Parameter Server: the parameter server allows data to be stored by key in a central
location. It is currently part of the Master.
• Messages: nodes communicate with each other by passing messages. A message is
simply a data structure, comprising typed fields. Standard primitive types (integer,
floating-point, boolean, etc.) are supported, as are arrays of primitive types. Messages
can include arbitrarily nested structures and arrays (much like C structs).
• Topics: messages are routed via a transport system with publish-subscribe semantics.
A node sends out a message by publishing it to a given topic. The topic is a name that
is used to identify the content of the message. A node that is interested in a certain
kind of data will subscribe to the appropriate topic. There may be multiple concurrent
publishers and subscribers for a single topic, and a single node may publish and sub-
scribe to multiple topics. In general, publishers and subscribers are not aware of each
others’ existence. The idea is to decouple the production of information from its con-
sumption. Logically, one can think of a topic as a strongly typed message bus. Each
bus has a name, and anyone can connect to the bus to send or receive messages as long
as they are the right type.
• Services: the publish-subscribe model is a very flexible communication paradigm, but
its many-to-many, one-way transport is not appropriate for request-reply interactions,
which are often required in a distributed system. Request-reply is done via services,
which are defined by a pair of message structures: one for the request and one for the
reply. A providing node offers a service under a name and a client uses the service
by sending the request message and awaiting the reply. ROS client libraries generally
present this interaction to the programmer as if it were a remote procedure call.
• Bags: bags are a format for saving and playing back ROS message data. Bags are an
important mechanism for storing data, such as sensor data, that can be difficult to
collect but is necessary for developing and testing algorithms.
The ROS Master acts as a name service in the ROS Computation Graph. It stores topics
and services registration information for ROS nodes. Nodes communicate with the Master to
report their registration information. As these nodes communicate with the Master, they can re-
ceive information about other registered nodes and make connections as appropriate. The Master
26 2. FPGA TECHNOLOGIES
ROS-compliant FPGA component on ARM-FPGA SoC

Publisher Subscriber Publisher Subscriber


ARM

FPGA FPGA
Input Output
interface interface

FPGA

Applications ROS
node

Figure 2.5: ROS-compliant FPGAs.

will also make callbacks to these nodes when this registration information changes, which allows
nodes to dynamically create connections as new nodes are run.
Nodes connect to other nodes directly; the Master only provides lookup information,
much like a domain name service (DNS) server. Nodes that subscribe to a topic will request
connections from nodes that publish that topic and will establish that connection over an agreed-
upon connection protocol. This architecture allows for decoupled operations, where the names
are the primary means by which larger and more complex systems can be built. Names have a
very important role in ROS: nodes, topics, services, and parameters all have names. Every ROS
client library supports command-line remapping of names, which means a compiled program
can be reconfigured at runtime to operate in a different Computation Graph topology.

2.3.2 ROS-COMPLIANT FPGAS


In order to integrate FPGAs into a ROS-based system, a ROS-compliant FPGA component
has been proposed [64, 65]. Integration of an FPGA into a robotic system requires equivalent
functionality to replace a software ROS component with a ROS-compliant FPGA component.
Therefore, each ROS message type and data format used in the ROS-compliant FPGA com-
ponent must be the same as that of the software ROS component. The ROS-compliant FPGA
component aims to improve its processing performance while satisfying the requirements.
Figure 2.5 shows the architecture of the ROS-compliant FPGA component model. Each
ROS-compliant FPGA component must implement the following four functions: Encapsula-
tion of FPGA circuits, Interface between ROS software and FPGA circuits, Subscribe interface
from a topic, and Publish interface to a topic. The ARM core is responsible for communicating
with and offloading workloads to the FPGA, whereas the FPGA part performs actual workload
acceleration. Note that there are two roles of software in the component. First, an interface pro-
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 27
cess for input that subscribes to a topic to receive input data. The software component, which
runs on the ARM core, is responsible for formatting the data suitable for the FPGA processing
and sends the formatted data to the FPGA. Second, an interface process for output receives
processing results from the FPGA. The software component, which runs on the ARM core, is
responsible for reformatting the results suitable for the ROS system and publishes them to a
topic. Such a structure can realize a robot system in which software and hardware cooperate.
Note that the difference of ROS-compliant FPGA component from a ROS node written
in pure software is that processing contains hardware processing of an FPGA. Integration of
ROS-compliant FPGA component into a ROS system only requires connections to ROS nodes
through Publish/Subscribe messaging in ordinary ROS development style. The ROS-compliant
FPGA component provides easy integration of an FPGA by wrapping it with software.
To evaluate this design, the authors of [65] have implemented a hardwired image labeling
application on a ROS-compliant FPGA component on Xilinx Zynq-7020, and verifying that
this design performs 26 times faster than that of software with the ARM processor, and even
2.3 times faster than that of an Intel PC. Moreover, the end-to-end latency of the component is
1.7 times faster than that of processing with pure software. Therefore, the authors verify that the
ROS-compliant FPGA component achieves remarkable performance improvement, maintain-
ing high development productivity by cooperative processing of hardware and software. How-
ever, this also comes with a problem, as the authors found out that the communication of ROS
nodes is a major bottleneck of the execution time in the ROS-compliant FPGA component.

2.3.3 OPTIMIZING COMMUNICATION LATENCY FOR THE


ROS-COMPLIANT FPGAS
As indicated in the previous subsection, large communication latency between ROS components
is a severe problem and has been the bottleneck of offloading computing to FPGAs. The authors
in [66] aim to reduce the latency by implementing Publish/Subscribe messaging of ROS as
hardware. Based on the results of network packets analysis in the ROS system, the authors
propose a method of implementing a hardware ROS-compliant FPGA Component, which is
done by separating the registration part (XMLRPC) and data communication part (TCPROS)
of the Publish/Subscribe messaging.
To study ROS performance, the authors have compared the communication latency of
(1) PC-PC and (2) PC-ARM SoC. Two computer nodes are connected with each other through
a Gigabit Ethernet. The communication latency in (2) PC-ARM SoC environment is about four
times larger than (1) PC-PC. Therefore, the performance in embedded processor environments,
such as ARM processors, should be improved. Hence, the challenge for ROS-compliant FPGA
components is to reduce the large overhead in communication latency. If communication latency
is reduced, the ROS-compliant FPGA component can be used as an accelerator for processing
in robotic applications/systems.
28 2. FPGA TECHNOLOGIES
In order to implement Publish/Subscribe messaging of ROS as hardware, the authors
analyzed network packets that flowed in Publish/Subscribe messaging in the ROS system of
ordinary software. The authors utilized WireShark for network packet analysis [67] with the
basic ROS setup of one master, one publisher, and one subscriber node.

• STEP (1): the Publisher and Subscriber nodes register their nodes and topic informa-
tion to the Master node. The registration is done by calling methods like registerPub-
lisher, hasParam, and so on, using XMLRPC [68].

• STEP (2): the Master node notifies topic information to the Subscriber nodes by call-
ing publisherUpdate (XMLRPC).

• STEP (3): the Subscriber node sends a connection request to the Publisher node by
using requestTopic (XMLRPC).

• STEP (4): the Publisher node returns IP address and port number, TCP connection
information for data communication, as a response to the requestTopic (XMLRPC).

• STEP (5): the Subscriber node establishes a TCP connection by using the information
and sends connection header to the TCP connection. Connection header contains im-
portant metadata about a connection being established, including typing and routing
information, using TCPROS [69].

• STEP (6): if it is a successful connection, the Publisher node sends connection header
(TCPROS).

• STEP (7): data transmission repeats. This data is written with little endian and header
information (4 bytes) is added to the data (TCPROS).

After this analysis, the authors found out that network packets that flowed in Pub-
lish/Subscribe messaging in the ROS system can be categorized into two parts, that is, the
registration part and the data transmission part. The registration part uses XMLRPC (STEPS
(1)–(4)), while the data transmission part uses TCPROS (STEPS (5)–(7)), which is almost
raw data of TCP communication with very small overhead. In addition, once data transmission
(STEP (7)) starts, only data transmission repeats without STEPS (1)–(6).
Based on the network packet analysis, the authors modified the server ports, such that
those used in XMLRPC and TCPROS are assigned differently. In addition, a client TCP/IP
connection of XMLRPC for the Master node is necessary for the Publisher node. For the
Subscriber node, two client TCP/IP connections of XMLRPC and one client connection of
TCPROS are necessary. Therefore, two or three TCP ports are necessary to implement Pub-
lish/Subscribe messaging. It is a problem to implement ROS nodes using the hardware TCP/IP
stack.
2.4. SUMMARY 29
To optimize the communication performance on ROS-compliant FPGAs, the authors
proposed hardware publication and subscription services. Conventionally, publication or sub-
scription of topics was done by software in ROS. By implementing these nodes as hardwired
circuits, direct communication between the ROS nodes and the FPGA becomes not only pos-
sible but also highly efficient. In order to implement the hardware ROS nodes, the authors
designed the Subscriber hardware and the Publisher hardware separately: the Subscriber hard-
ware is responsible to subscribe to a topic of another ROS node and to receive ROS messages
from the topic; whereas the Publisher hardware is responsible to publish ROS messages to a
topic of another ROS node. With this hardware-based design, the evaluation results indicate
that the latency of the Hardware ROS-compliant FPGA component can be cut to half, from
1.0 ms to 0.5 ms, thus effectively improving the communication between the FPGA accelerator
and other software-based ROS nodes.

2.4 SUMMARY
In this chapter, we have provided a general introduction to FPGA technologies, especially run-
time partial reconfiguration, which allows multiple robotic workloads to time-share an FPGA
at runtime. We also have introduced existing research on enabling ROS on FPGAs, which pro-
vides infrastructure supports for various robotic workloads to run directly on FPGAs. However,
the ecosystem of robotic computing on FPGAs is still in its infancy. For instance, due to the
lack of high-level synthesis tools for robotic accelerator design, accelerating robotic workloads,
or part of a robotic workload, on FPGAs still require extensive manual efforts. To make the
matter worse, most robotic engineers do not have sufficient FPGA background to develop an
FPGA-based accelerator, whereas few FPGA engineers possess sufficient robotic background
to fully understand a robotic system. Hence, to fully exploit the benefits of FPGAs, advanced
design automation tools are imperative to bridge this knowledge gap.
31

CHAPTER 3

Perception on FPGAs – Deep


Learning
Cameras are widely used in intelligent robot systems because of their lightweight and rich in-
formation for perception. Cameras can be used to complete a variety of basic tasks of intelligent
robots, such as visual odometry (VO), place recognition, object detection, and recognition. With
the development of convolutional neural networks (CNNs), we can reconstruct the depth and
pose with the absolute scale directly from a monocular camera, making monocular VO more
robust and efficient. And monocular VO methods, like Depth-VO-Feat [70], make robot sys-
tems much easier to deploy than stereo ones. Furthermore, although there are previous works
to design accelerators for robot applications, such as ESLAM [71], the accelerators can only be
used for specific applications with poor scalability.
In recent years, CNN has made great improvements on the place recognition for robotic
perception. The accuracy of the place recognition code from another CNN-based method,
GeM [72], is about 20% better than the handcrafted method, rootSIFT [73]. CNN is a general
framework, which can be applied to a variety of robotic applications. With the help of CNN,
the robots can also detect and distinguish objects from input images. In summary, CNNs greatly
enhance robots’ ability in localization, place recognition, and many other perception tasks.
CNNs have become the core component in various kinds of robots. However, since neural
networks (NNs) are computationally intensive, deep learning models are often the performance
bottleneck in robots. In this chapter, we delve into utilizing FPGAs to accelerate neural networks
in various robotic workloads.
Specifically, neural networks are widely adopted in regions like image, speech, and video
recognition. What’s more, deep learning has made significant progress in solving robotic per-
ception problems. But the high computation and storage complexity of neural network inference
poses great difficulty in its application. CPUs are hard to offer enough computational capacity.
GPUs are the first choice for the neural network process because of their high computational
capacity and easy-to-use development frameworks but suffer from energy inefficiency.
On the other hand, with specifically designed hardware, FPGAs are a potential candi-
date to surpass GPUs in performance and energy efficiency. Various FPGA-based accelerators
have been proposed with software and hardware optimization techniques to achieve high per-
formance and energy efficiency. In this chapter, we give an overview of previous work on neural
network inference accelerators based on FPGAs and summarize the main techniques used. An
32 3. PERCEPTION ON FPGAS – DEEP LEARNING
Table 3.1: Performance and resource utilization of state-of-the-art neural network accelerator
designs

AlexNet[74] VGG19[78] ResNet152[81] MobileNet[79]


Year 2012 2014 2016 2017 2017
# Param 60M 144M 57M 4.2M 2.36M
# Operation 1.4G 39G 22.6G 1.1G 0.27G
Top-1 Acc. 61.0% 74.5% 79.3% 70.6% 67.6%

investigation from software to hardware, from circuit level to system level, is carried out for
a complete analysis of FPGA-based deep learning accelerators and serves as a guide to future
work.

3.1 WHY CHOOSE FPGAS FOR DEEP LEARNING?


Recent research works on neural networks demonstrate great improvements over traditional al-
gorithms in machine learning. Various network models, like CNNs, recurrent neural networks
(RNNs), have been proposed for image, video, and speech processes. CNNs [74] improve the
top-5 image classification accuracy on ImageNet [75] dataset from 73.8–84.7% in 2012 and fur-
ther improve object detection [76] with its outstanding ability in feature extraction. RNNs [77]
achieve the state-of-the-art word error rate on speech recognition. In general, NNs feature a
high fitting ability to a wide range of pattern recognition problems. This ability makes NNs
promising candidates for many artificial intelligence applications.
But the computation and storage complexity of NN models are high. In Table 3.1, we list
the number of operations, number of parameters (add or multiplication), and top-1 accuracy on
ImageNet dataset [75] of state-of-the-art CNN models. Take CNNs as an example. The largest
CNN model for a 224  224 image classification requires up to 39 billion floating-point opera-
tions (FLOP) and more than 500 MB model parameters [78]. As the computational complexity
is proportional to the input image size, processing images with higher resolutions may need more
than 100 billion operations. Latest works like MobileNet [79] and ShuffleNet [80] are trying
to reduce the network size with advanced network structures, but with obvious accuracy loss.
The balance between the size of NN models and accuracy is still an open question today. In
some cases, the large model size hinders the application of NNs, especially in power-limited or
latency-critical scenarios.
Therefore, choosing a proper computation platform for neural-network-based applica-
tions is essential. A typical CPU can perform 10–100 GFLOP per second, and the power effi-
ciency is usually below 1 GOP/J. So CPUs are hard to meet the high-performance requirements
in cloud applications nor the low power requirements in mobile applications. In contrast, GPUs
Discovering Diverse Content Through
Random Scribd Documents
that what we see is a painting. At the same time, we are not
satisfied with an expression which several writers, we remark, have
lately used, and which Mr Ruskin very explicitly adopts. The
imitations of the landscape-painter are not a "language" which he
uses; they are not mere "signs," analogous to those which the poet
or the orator employs. There is no analogy between them. Let us
analyse our impressions as we stand before the artist's landscape,
not thinking of the artist, or his dexterity, but simply absorbed in the
pleasure which he procures us—we do not find ourselves reverting,
in imagination, to other trees or other rivers than those he has
depicted. We certainly do not believe them to be real trees, but
neither are they mere signs, or a language to recall such objects;
but what there is of tree there we enjoy. There is the coolness and
the quiet of the shaded avenue, and we feel them; there is the
sunlight on that bank, and we feel its cheerfulness; we feel the
serenity of his river. He has brought the spirit of the trees around us;
the imagination rests in the picture. In other departments of art the
effect is the same. If we stand before a head of Rembrandt or
Vandyke, we do not think that it lives; but neither do we think of
some other head, of which that is the type. But there is majesty,
there is thought, there is calm repose, there is some phase of
humanity expressed before us, and we are occupied with so much of
human life, or human character, as is then and there given us.
Imitate as many qualities of the real object as you please, but
always the highest, never sacrificing a truth of the mind, or the
heart, for one only of the sense. Truth, as Mr Ruskin most justly says
—truth always. When it is said that truth should not be always
expressed, the maxim, if properly understood, resolves into this—
that the higher truth is not to be sacrificed to the lower. In a
landscape, the gradation of light and shade is a more important
truth than the exact brilliancy (supposing it to be attainable,) of any
individual object. The painter must calculate what means he has at
his disposal for representing this gradation of light, and he must
pitch his tone accordingly. Say he pitches it far below reality, he is
still in search of truth—of contrast and degree.
Sometimes it may happen that, by rendering one detail faithfully,
an artist may give a false impression, simply because he cannot
render other details or facts by which it is accompanied in nature.
Here, too, he would only sacrifice truth in the cause of truth. The
admirers of Constable will perhaps dispute the aptness of our
illustration. Nevertheless his works appear to us to afford a curious
example of a scrupulous accuracy or detail producing a false
impression. Constable, looking at foliage under the sunlight, and
noting that the leaf, especially after a shower, will reflect so much
light that the tree will seem more white than green, determined to
paint all the white he saw. Constable could paint white leaves. So far
so well. But then these leaves in nature are almost always in motion:
they are white at one moment and green the next. We never have
the impression of a white leaf; for it is seen playing with the light—
its mirror, for one instant, and glancing from it the next. Constable
could not paint motion. He could not imitate this shower of light in
the living tree. He must leave his white paint where he has once put
it. Other artists before him had seen the same light, but, knowing
that they could not bring the breeze into their canvass, they wisely
concluded that less white paint than Constable uses would produce a
more truthful impression.
But we must no longer be detained from the more immediate task
before us. We must now follow Mr Ruskin to his second volume of
Modern Painters, where he explains his theory of the beautiful; and
although this will not be to readers in general the most attractive
portion of his writings, and we ourselves have to practise some sort
of self-denial in fixing our attention upon it, yet manifestly it is here
that we must look for the basis or fundamental principles of all his
criticisms in art. The order in which his works have been published
was apparently deranged by a generous zeal, which could brook no
delay, to defend Mr Turner from the censures of the undiscerning
public. If the natural or systematic order had been preserved, the
materials of this second volume would have formed the first
preliminary treatise, determining those broad principles of taste, or
that philosophical theory of the beautiful, on which the whole of the
subsequent works were to be modelled. Perhaps this broken and
reversed order of publication has not been unfortunate for the
success of the author—perhaps it was dimly foreseen to be not
altogether impolitic; for the popular ear was gained by the bold and
enthusiastic defence of a great painter; and the ear of the public,
once caught, may be detained by matter which, in the first instance,
would have appealed to it in vain. Whether the effect of chance or
design, we may certainly congratulate Mr Ruskin on the fortunate
succession, and the fortunate rapidity with which his publications
have struck on the public ear. The popular feeling, won by the zeal
and intrepidity of the first volume of Modern Painters, was no doubt
a little tried by the graver discussions of the second. It was soon,
however, to be again caught, and pleased by a bold and agreeable
miscellany under the magical name of "The Seven Lamps;" and
these Seven Lamps could hardly fail to throw some portion of their
pleasant and bewildering light over a certain rudimentary treatise
upon building, which was to appear under the title of "The Stones of
Venice."
We cannot, however, congratulate Mr Ruskin on the manner in
which he has acquitted himself in this arena of philosophical inquiry,
nor on the sort of theory of the Beautiful which he has contrived to
construct. The least metaphysical of our readers is aware that there
is a controversy of long standing upon this subject, between two
different schools of philosophy. With the one the beautiful is
described as a great "idea" of the reason, or an intellectual intuition,
or a simple intuitive perception; different expressions are made use
of, but all imply that it is a great primary feeling, or sentiment, or
idea of the human mind, and as incapable of further analysis as the
idea of space, or the simplest of our sensations. The rival school of
theorists maintain, on the contrary, that no sentiment yields more
readily to analysis; and that the beautiful, except in those rare cases
where the whole charm lies in one sensation, as in that of colour, is
a complex sentiment. They describe it as a pleasure resulting from
the presence of the visible object, but of which the visible object is
only in part the immediate cause. Of a great portion of the pleasure
it is merely the vehicle; and they say that blended reminiscences,
gathered from every sense, and every human affection, from the
softness of touch of an infant's finger to the highest contemplations
of a devotional spirit, have contributed, in their turn, to this
delightful sentiment.
Mr Ruskin was not bound to belong to either of these schools of
philosophy; he was at liberty to construct an eclectic system of his
own;—and he has done so. We shall take the precaution, in so
delicate a matter, of quoting Mr Ruskin's own words for the
exposition of his own theory. Meanwhile, as some clue to the reader,
we may venture to say that he agrees with the first of these schools
in adopting a primary intuitive sentiment of the beautiful; but then
this primary intuition is only of a sensational or "animal" nature—a
subordinate species of the beautiful, which is chiefly valuable as the
necessary condition of the higher and truly beautiful; and this last he
agrees with the opposite school in regarding as a derived sentiment
—derived by contemplating the objects of external nature as types
of the Divine attributes. This is a brief summary of the theory; for a
fuller exposition we shall have recourse to his own words.
The term Æsthetic, which has been applied to this branch of
philosophy, Mr Ruskin discards; he offers as a substitute Theoria, or
The Theoretic Faculty, the meaning of which he thus explains:—

"I proceed, therefore, first to examine the nature of what I


have called the theoretic faculty, and to justify my substitution
of the term 'Theoretic' for 'Æsthetic,' which is the one commonly
employed with reference to it.
"Now the term 'æsthesis' properly signifies mere sensual
perception of the outward qualities and necessary effects of
bodies; in which sense only, if we would arrive at any accurate
conclusions on this difficult subject, it should always be used.
But I wholly deny that the impressions of beauty are in any way
sensual;—they are neither sensual nor intellectual, but moral;
and for the faculty receiving them, whose difference from mere
perception I shall immediately endeavour to explain, no terms
can be more accurate or convenient than that employed by the
Greeks, 'Theoretic,' which I pray permission, therefore, always
to use, and to call the operation of the faculty itself, Theoria."—
(P. 11.)

We are introduced to a new faculty of the human mind; let us see


what new or especial sphere of operation is assigned to it. After
some remarks on the superiority of the mere sensual pleasures of
the eye and the ear, but particularly of the eye, to those derived
from other organs of sense, he continues:—

"Herein, then, we find very sufficient ground for the higher


estimation of these delights: first, in their being eternal and
inexhaustible; and, secondly, in their being evidently no meaner
instrument of life, but an object of life. Now, in whatever is an
object of life, in whatever may be infinitely and for itself desired,
we may be sure there is something of divine: for God will not
make anything an object of life to his creatures which does not
point to, or partake of himself,"—[a bold assertion.] "And so,
though we were to regard the pleasures of sight merely as the
highest of sensual pleasures, and though they were of rare
occurrence—and, when occurring, isolated and imperfect—there
would still be supernatural character about them, owing to their
self-sufficiency. But when, instead of being scattered,
interrupted, or chance-distributed, they are gathered together
and so arranged to enhance each other, as by chance they could
not be, there is caused by them, not only a feeling of strong
affection towards the object in which they exist, but a
perception of purpose and adaptation of it to our desires; a
perception, therefore, of the immediate operation of the
Intelligence which so formed us and so feeds us.
"Out of what perception arise Joy, Admiration, and Gratitude?
"Now, the mere animal consciousness of the pleasantness I
call Æsthesis; but the exulting, reverent, and grateful perception
of it I call Theoria. For this, and this only, is the full
comprehension and contemplation of the beautiful as a gift of
God; a gift not necessary to our being, but adding to and
elevating it, and twofold—first, of the desire; and, secondly, of
the thing desired."

We find, then, that in the production of the full sentiment of the


beautiful two faculties are employed, or two distinct operations
denoted. First, there is the "animal pleasantness which we call
Æsthesis,"—which sometimes appears confounded with the mere
pleasures of sense, but which the whole current of his speculations
obliges us to conclude is some separate intuition of a sensational
character; and, secondly, there is "the exulting, reverent, and
grateful perception of it, which we call Theoria," which alone is the
truly beautiful, and which it is the function of the Theoretic Faculty
to reveal to us. But this new Theoretic Faculty—what can it be but
the old faculty of Human Reason, exercised upon the great subject
of Divine beneficence?
Mr Ruskin, as we shall see, discovers that external objects are
beautiful because they are types of Divine attributes; but he admits,
and is solicitous to impress upon our minds, that the "meaning" of
these types is "learnt." When, in a subsequent part of his work, he
feels himself pressed by the objection that many celebrated artists,
who have shown a vivid appreciation and a great passion for the
beautiful, have manifested no peculiar piety, have been rather
deficient in spiritual-mindedness, he gives them over to that
instinctive sense he has called Æsthesis, and says—"It will be
remembered that I have, throughout the examination of typical
beauty, asserted our instinctive sense of it; the moral meaning of it
being only discoverable by reflection," (p. 127.) Now, there is no
other conceivable manner in which the meaning of the type can be
learnt than by the usual exercise of the human reason, detecting
traces of the Divine power, and wisdom, and benevolence, in the
external world, and then associating with the various objects of the
external world the ideas we have thus acquired of the Divine wisdom
and goodness. The rapid and habitual regard of certain facts or
appearances in the visible world, as types of the attributes of God,
can be nothing else but one great instance (or class of instances) of
that law of association of ideas on which the second school of
philosophy we have alluded to so largely insist. And thus, whether
Mr Ruskin chooses to acquiesce in it or not, his "Theoria" resolves
itself into a portion, or fragment, of that theory of association of
ideas, to which he declares, and perhaps believes, himself to be
violently opposed.
In a very curious manner, therefore, has Mr Ruskin selected his
materials from the two rival schools of metaphysics. His Æsthesis is
an intuitive perception, but of a mere sensual or animal nature—
sometimes almost confounded with the mere pleasure of sense, at
other times advanced into considerable importance, as where he has
to explain the fact that men of very little piety have a very acute
perception of beauty. His Theoria is, and can be, nothing more than
the results of human reason in its highest and noblest exercise,
rapidly brought before the mind by a habitual association of ideas.
For the lowest element of the beautiful he runs to the school of
intuitions;—they will not thank him for the compliment;—for the
higher to that analytic school, and that theory of association of
ideas, to which throughout he is ostensibly opposed.
This Theoria divides itself into two parts. We shall quote Mr
Ruskin's own words and take care to quote from them passages
where he seems most solicitous to be accurate and explanatory:—

"The first thing, then, we have to do," he says, "is accurately


to discriminate and define those appearances from which we
are about to reason as belonging to beauty, properly so called,
and to clear the ground of all the confused ideas and erroneous
theories with which the misapprehension or metaphorical use of
the term has encumbered it.
"By the term Beauty, then, properly are signified two things:
first, that external quality of bodies, already so often spoken of,
and which, whether it occur in a stone, flower, beast, or in man,
is absolutely identical—which, as I have already asserted, may
be shown to be in some sort typical of the Divine attributes, and
which, therefore, I shall, for distinction's sake, call Typical
Beauty; and, secondarily, the appearance of felicitous fulfilment
of functions in living things, more especially of the joyful and
right exertion of perfect life in man—and this kind of beauty I
shall call Vital Beauty."—(P. 26.)

The Vital Beauty, as well as the Typical, partakes essentially, as far


as we can understand our author, of a religious character. On turning
to that part of the volume where it is treated of at length, we find a
universal sympathy and spirit of kindliness very properly insisted on,
as one great element of the sentiment of beauty; but we are not
permitted to dwell upon this element, or rest upon it a moment,
without some reference to our relation to God. Even the animals
themselves seem to be turned into types for us of our moral feelings
or duties. We are expressly told that we cannot have this sympathy
with life and enjoyment in other creatures, unless it takes the form
of, or comes accompanied with, a sentiment of piety. In all cases
where the beautiful is anything higher than a certain "animal
pleasantness," we are to understand that it has a religious character.
"In all cases," he says, summing up the functions of the Theoretic
Faculty, "it is something Divine; either the approving voice of God,
the glorious symbol of Him, the evidence of His kind presence, or
the obedience to His will by Him induced and supported,"—(p. 126.)
Now it is a delicate task, when a man errs by the exaggeration of a
great truth or a noble sentiment, to combat his error; and yet as
much mischief may ultimately arise from an error of this description
as from any other. The thoughts and feelings which Mr Ruskin has
described, form the noblest part of our sentiment of the beautiful, as
they form the noblest phase of the human reason. But they are not
the whole of it. The visible object, to adopt his phraseology, does
become a type to the contemplative and pious mind of the attribute
of God, and is thus exalted to our apprehension. But it is not
beautiful solely or originally on this account. To assert this, is simply
to falsify our human nature.
Before, however, we enter into these types, or this typical beauty,
it will be well to notice how Mr Ruskin deals with previous and
opposing theories. It will be well also to remind our readers of the
outline of that theory of association of ideas which is here presented
to us in so very confused a manner. We shall then be better able to
understand the very curious position our author has taken up in this
domain of speculative philosophy.
Mr Ruskin gives us the following summary of the "errors" which he
thinks it necessary in the first place to clear from his path:—

"Those erring or inconsistent positions which I would at once


dismiss are, the first, that the beautiful is the true; the second,
that the beautiful is the useful; the third, that it is dependent on
custom; and the fourth, that it is dependent on the association
of ideas."

The first of these theories, that the beautiful is the true, we leave
entirely to the tender mercies of Mr Ruskin; we cannot gather from
his refutation to what class of theorists he is alluding. The remaining
three are, as we understand the matter, substantially one and the
same theory. We believe that no one, in these days, would define
beauty as solely resulting either from the apprehension of Utility,
(that is, the adjustment of parts to a whole, or the application of the
object to an ulterior purpose,) or to Familiarity and the affection
which custom engenders; but they would regard both Utility and
Familiarity as amongst the sources of those agreeable ideas or
impressions, which, by the great law of association, became
intimately connected with the visible object. We must listen,
however, to Mr Ruskin's refutation of them:—

"That the beautiful is the useful is an assertion evidently


based on that limited and false sense of the latter term which I
have already deprecated. As it is the most degrading and
dangerous supposition which can be advanced on the subject,
so, fortunately, it is the most palpably absurd. It is to confound
admiration with hunger, love with lust, and life with sensation; it
is to assert that the human creature has no ideas and no
feelings, except those ultimately referable to its brutal appetites.
It has not a single fact, nor appearance of fact, to support it,
and needs no combating—at least until its advocates have
obtained the consent of the majority of mankind that the most
beautiful productions of nature are seeds and roots; and of art,
spades and millstones.
"Somewhat more rational grounds appear for the assertion
that the sense of the beautiful arises from familiarity with the
object, though even this could not long be maintained by a
thinking person. For all that can be alleged in defence of such a
supposition is, that familiarity deprives some objects which at
first appeared ugly of much of their repulsiveness; whence it is
as rational to conclude that familiarity is the cause of beauty, as
it would be to argue that, because it is possible to acquire a
taste for olives, therefore custom is the cause of lusciousness in
grapes....
"I pass to the last and most weighty theory, that the
agreeableness in objects which we call beauty is the result of
the association with them of agreeable or interesting ideas.
"Frequent has been the support and wide the acceptance of
this supposition, and yet I suppose that no two consecutive
sentences were ever written in defence of it, without involving
either a contradiction or a confusion of terms. Thus Alison,
'There are scenes undoubtedly more beautiful than Runnymede,
yet, to those who recollect the great event that passed there,
there is no scene perhaps which so strongly seizes on the
imagination,'—where we are wonder-struck at the bold
obtuseness which would prove the power of imagination by its
overcoming that very other power (of inherent beauty) whose
existence the arguer desires; for the only logical conclusion
which can possibly be drawn from the above sentence is, that
imagination is not the source of beauty—for, although no scene
seizes so strongly on the imagination, yet there are scenes
'more beautiful than Runnymede.' And though instances of self-
contradiction as laconic and complete as this are rare, yet, if the
arguments on the subject be fairly sifted from the mass of
confused language with which they are always encumbered,
they will be found invariably to fall into one of these two forms:
either association gives pleasure, and beauty gives pleasure,
therefore association is beauty; or the power of association is
stronger than the power of beauty, therefore the power of
association is the power of beauty."

Now this last sentence is sheer nonsense, and only proves that the
author had never given himself the trouble to understand the theory
he so flippantly discards. No one ever said that "association gives
pleasure;" but very many, and Mr Ruskin amongst the rest, have
said that associated thought adds its pleasure to an object pleasing
in itself, and thus increases the complex sentiment of beauty. That it
is a complex sentiment in all its higher forms, Mr Ruskin himself will
tell us. As to the manner in which he deals with Alison, it is in the
worst possible spirit of controversy. Alison was an elegant, but not a
very precise writer; it was the easiest thing in the world to select an
unfortunate illustration, and to convict that of absurdity. Yet he
might with equal ease have selected many other illustrations from
Alison, which would have done justice to the theory he expounds. A
hundred such will immediately occur to the reader. If, instead of a
historical recollection of this kind, which could hardly make the
stream itself of Runnymede look more beautiful, Alison had confined
himself to those impressions which the generality of mankind receive
from river scenery, he would have had no difficulty in showing (as
we believe he has elsewhere done) how, in this case, ideas gathered
from different sources flow into one harmonious and apparently
simple feeling. That sentiment of beauty which arises as we look
upon a river will be acknowledged by most persons to be composed
of many associated thoughts, combining with the object before
them. Its form and colour, its bright surface and its green banks, are
all that the eye immediately gives us; but with these are combined
the remembered coolness of the fluent stream, and of the breeze
above it, and of the pleasant shade of its banks; and beside all this—
as there are few persons who have not escaped with delight from
town or village, to wander by the quiet banks of some neighbouring
stream, so there are few persons who do not associate with river
scenery ideas of peace and serenity. Now many of these thoughts or
facts are such as the eye does not take cognisance of, yet they
present themselves as instantaneously as the visible form, and so
blended as to seem, for the moment, to belong to it.
Why not have selected some such illustration as this, instead of
the unfortunate Runnymede, from a work where so many abound as
apt as they are elegantly expressed? As to Mr Ruskin's utilitarian
philosopher, it is a fabulous creature—no such being exists. Nor need
we detain ourselves with the quite departmental subject of
Familiarity. But let us endeavour—without desiring to pledge
ourselves or our readers to its final adoption—to relieve the theory
of association of ideas from the obscurity our author has thrown
around it. Our readers will not find that this is altogether a wasted
labour.
With Mr Ruskin we are of opinion that, in a discussion of this kind,
the term Beauty ought to be limited to the impression derived,
mediately or immediately, from the visible object. It would be
useless affectation to attempt to restrict the use of the word, in
general, to this application. We can have no objection to the term
Beautiful being applied to a piece of music, or to an eloquent
composition, prose or verse, or even to our moral feelings and heroic
actions; the word has received this general application, and there is,
at basis, a great deal in common between all these and the
sentiment of beauty attendant on the visible object. For music, or
sweet sounds, and poetry, and our moral feelings, have much to do
(through the law of association) with our sentiment of the Beautiful.
It is quite enough if, speaking of the subject of our analysis, we limit
it to those impressions, however originated, which attend upon the
visible object.
One preliminary word on this association of ideas. It is from its
very nature, and the nature of human life, of all degrees of intimacy
—from the casual suggestion, or the case where the two ideas are at
all times felt to be distinct, to those close combinations where the
two ideas have apparently coalesced into one, or require an
attentive analysis to separate them. You see a mass of iron; you may
be said to see its weight, the impression of its weight is so intimately
combined with its form. The light of the sun, and the heat of the sun
are learnt from different senses, yet we never see the one without
thinking of the other, and the reflection of the sunbeam seen upon a
bank immediately suggests the idea of warmth. But it is not
necessary that the combination should be always so perfect as in
this instance, in order to produce the effect we speak of under the
name of Association of Ideas. It is hardly possible for us to abstract
the glow of the sunbeam from its light; but the fertility which follows
upon the presence of the sun, though a suggestion which habitually
occurs to reflective minds, is an association of a far less intimate
nature. It is sufficiently intimate, however, to blend with that feeling
of admiration we have when we speak of the beauty of the sun.
There is the golden harvest in its summer beams. Again, the
contemplative spirit in all ages has formed an association between
the sun and the Deity, whether as the fittest symbol of God, or as
being His greatest gift to man. Here we have an association still
more refined, and of a somewhat less frequent character, but one
which will be found to enter, in a very subtle manner, into that
impression we receive from the great luminary.
And thus it is that, in different minds, the same materials of
thought may be combined in a closer or laxer relationship. This
should be borne in mind by the candid inquirer. That in many
instances ideas from different sources do coalesce, in the manner
we have been describing, he cannot for an instant doubt. He seems
to see the coolness of that river; he seems to see the warmth on
that sunny bank. In many instances, however, he must make
allowance for the different habitudes of life. The same illustration will
not always have the same force to all men. Those who have
cultivated their minds by different pursuits, or lived amongst scenery
of a different character, cannot have formed exactly the same moral
association with external nature.
These preliminaries being adjusted, what, we ask, is that first
original charm of the visible object which serves as the foundation
for this wonderful superstructure of the Beautiful, to which almost
every department of feeling and of thought will be found to bring its
contribution? What is it so pleasurable that the eye at once receives
from the external world, that round it should have gathered all these
tributary pleasures? Light—colour—form; but, in reference to our
discussion, pre-eminently the exquisite pleasure derived from the
sense of light, pure or coloured. Colour, from infancy to old age, is
one original, universal, perpetual source of delight, the first and
constant element of the Beautiful.
We are far from thinking that the eye does not at once take
cognisance of form as well as colour. Some ingenious analysts have
supposed that the sensation of colour is, in its origin, a mere mental
affection, having no reference to space or external objects, and that
it obtains this reference through the contemporaneous acquisition of
the sense of touch. But there can be no more reason for supposing
that the sense of touch informs us immediately of an external world
than that the sense of colour does. If we do not allow to all the
senses an intuitive reference to the external world, we shall get it
from none of them. Dr Brown, who paid particular attention to this
subject, and who was desirous to limit the first intimation of the
sense of sight to an abstract sensation of unlocalised colour, failed
entirely in his attempt to obtain from any other source the idea of
space or outness; Kant would have given him certain subjective
forms of the sensitive faculty, space and time. These he did not like:
he saw that, if he denied to the eye an immediate perception of the
external world, he must also deny it to the touch; he therefore
prayed in aid certain muscular sensations from which the idea of
resistance would be obtained. But it seems to us evident that not till
after we have acquired a knowledge of the external world can we
connect volition with muscular movement, and that, until that
connection is made, the muscular sensations stand in the same
predicament as other sensations, and could give him no aid in
solving his problem. We cannot go further into this matter at
present.[6] The mere flash of light which follows the touch upon the
optic nerve represents itself as something without; nor was colour,
we imagine, ever felt, but under some form more or less distinct;
although in the human being the eye seems to depend on the touch
far more than in other animals, for its further instruction.
But although the eye is cognisant of form as well as colour, it is in
the sensation of colour that we must seek the primitive pleasure
derived from this organ. And probably the first reason why form
pleases is this, that the boundaries of form are also the lines of
contrast of colour. It is a general law of all sensation that, if it be
continued, our susceptibility to it declines. It was necessary that the
eye should be always open. Its susceptibility is sustained by the
perpetual contrast of colours. Whether the contrast is sudden, or
whether one hue shades gradually into another, we see here an
original and primary source of pleasure. A constant variety, in some
way produced, is essential to the maintenance of the pleasure
derived from colour.
It is not incumbent on us to inquire how far the beauty of form
may be traceable to the sensation of touch;—a very small portion of
it we suspect. In the human countenance, and in sculpture, the
beauty of form is almost resolvable into expression; though possibly
the soft and rounded outline may in some measure be associated
with the sense of smoothness to the touch. All that we are
concerned to show is, that there is here in colour, diffused as it is
over the whole world, and perpetually varied, a beauty at once
showered upon the visible object. We hear it said, if you resolve all
into association, where will you begin? You have but a circle of
feelings. If moral sentiment, for instance, be not itself the beautiful,
why should it become so by association. There must be something
else that is the beautiful, by association with which it passes for
such. We answer, that we do not resolve all into association; that we
have in this one gift of colour, shed so bountifully over the whole
world, an original beauty, a delight which makes the external object
pleasant and beloved; for how can we fail, in some sort, to love
what produces so much pleasure?
We are at a loss to understand how any one can speak with
disparagement of colour as a source of the beautiful. The sculptor
may, perhaps, by his peculiar education, grow comparatively
indifferent to it: we know not how this may be; but let any man, of
the most refined taste imaginable, think what he owes to this
source, when he walks out at evening, and sees the sun set
amongst the hills. The same concave sky, the same scene, so far as
its form is concerned, was there a few hours before, and saddened
him with its gloom; one leaden hue prevailed over all; and now in a
clear sky the sun is setting, and the hills are purple, and the clouds
are radiant with every colour that can be extracted from the
sunbeam. He can hardly believe that it is the same scene, or he the
same man. Here the grown-up man and the child stand always on
the same level. As to the infant, note how its eye feeds upon a
brilliant colour, or the living flame. If it had wings, it would assuredly
do as the moth does. And take the most untutored rustic, let him be
old, and dull, and stupid, yet, as long as the eye has vitality in it, will
he look up with long untiring gaze at this blue vault of the sky,
traversed by its glittering clouds, and pierced by the tall green trees
around him.
Is it any marvel now that round the visible object should associate
tributary feelings of pleasure? How many pleasing and tender
sentiments gather round the rose! Yet the rose is beautiful in itself.
It was beautiful to the child by its colour, its texture, its softly-
shaded leaf, and the contrast between the flower and the foliage.
Love, and poetry, and the tender regrets of advanced life, have
contributed a second dower of beauty. The rose is more to the youth
and to the old man than it was to the child; but still to the last they
both feel the pleasure of the child.
The more commonplace the illustration, the more suited it is to
our purpose. If any one will reflect on the many ideas that cluster
round this beautiful flower, he will not fail to see how numerous and
subtle may be the association formed with the visible object. Even
an idea painful in itself may, by way of contrast, serve to heighten
the pleasure of others with which it is associated. Here the thought
of decay and fragility, like a discord amongst harmonies, increases
our sentiment of tenderness. We express, we believe, the prevailing
taste when we say that there is nothing, in the shape of art, so
disagreeable and repulsive as artificial flowers. The waxen flower
may be an admirable imitation, but it is a detestable thing. This
partly results from the nature of the imitation; a vulgar deception is
often practised upon us: what is not a flower is intended to pass for
one. But it is owing still more, we think, to the contradiction that is
immediately afterwards felt between this preserved and imperishable
waxen flower, and the transitory and perishable rose. It is the nature
of the rose to bud, and blossom, and decay; it gives its beauty to
the breeze and to the shower; it is mortal; it is ours; it bears our
hopes, our loves, our regrets. This waxen substitute, that cannot
change or decay, is a contradiction and a disgust.
Amongst objects of man's contrivance, the sail seen upon the calm
waters of a lake or a river is universally felt to be beautiful. The form
is graceful, and the movement gentle, and its colour contrasts well
either with the shore or the water. But perhaps the chief element of
our pleasure is all association with human life, with peaceful
enjoyment—
"This quiet sail is as a noiseless wing,
To waft me from distraction."

Or take one of the noblest objects in nature—the mountain. There


is no object except the sea and the sky that reflects to the sight
colours so beautiful, and in such masses. But colour, and form, and
magnitude, constitute but a part of the beauty or the sublimity of
the mountain. Not only do the clouds encircle or rest upon it, but
men have laid on it their grandest thoughts: we have associated
with it our moral fortitude, and all we understand of greatness or
elevation of mind; our phraseology seems half reflected from the
mountain. Still more, we have made it holy ground. Has not God
himself descended on the mountain? Are not the hills, once and for
ever, "the unwalled temples of our earth?" And still there is another
circumstance attendant upon mountain scenery, which adds a
solemnity of its own, and is a condition of the enjoyment of other
sources of the sublime—solitude. It seems to us that the feeling of
solitude almost always associates itself with mountain scenery. Mrs
Somerville, in the description which she gives or quotes, in her
Physical Geography, of the Himalayas, says—

"The loftiest peaks being bare of snow gives great variety of


colour and beauty to the scenery, which in these passes is at all
times magnificent. During the day, the stupendous size of the
mountains, their interminable extent, the variety and the
sharpness of their forms, and, above all, the tender clearness of
their distant outline melting into the pale blue sky, contrasted
with the deep azure above, is described as a scene of wild and
wonderful beauty. At midnight, when myriads of stars sparkle in
the black sky, and the pure blue of the mountains looks deeper
still below the pale white gleam of the earth and snow-light, the
effect is of unparalleled sublimity, and no language can describe
the splendour of the sunbeams at daybreak, streaming between
the high peaks, and throwing their gigantic shadows on the
mountains below. There, far above the habitation of man, no
living thing exists, no sound is heard; the very echo of the
traveller's footsteps startles him in the awful solitude and silence
that reigns in those august dwellings of everlasting snow."

No one can fail to recognise the effect of the last circumstance


mentioned. Let those mountains be the scene of a gathering of any
human multitude, and they would be more desecrated than if their
peaks had been levelled to the ground. We have also quoted this
description to show how large a share colour takes in beautifying
such a scene. Colour, either in large fields of it, or in sharp contrasts,
or in gradual shading—the play of light, in short, upon this world—is
the first element of beauty.
Here would be the place, were we writing a formal treatise upon
this subject, after showing that there is in the sense of sight itself a
sufficient elementary beauty, whereto other pleasurable
reminiscences may attach themselves, to point out some of these
tributaries. Each sense—the touch, the ear, the smell, the taste—
blend their several remembered pleasures with the object of vision.
Even taste, we say, although Mr Ruskin will scorn the gross alliance.
And we would allude to the fact to show the extreme subtilty of
these mental processes. The fruit which you think of eating has lost
its beauty from that moment—it assumes to you a quite different
relation; but the reminiscence that there is sweetness in the peach
or the grape, whilst it remains quite subordinate to the pleasure
derived from the sense of sight, mingles with and increases that
pleasure. Whilst the cluster of ripe grapes is looked at only for its
beauty, the idea that they are pleasant to the taste as well steals in
unobserved, and adds to the complex sentiment. If this idea grow
distinct and prominent, the beauty of the grape is gone—you eat it.
Here, too, would be the place to take notice of such sources of
pleasure as are derived from adaptation of parts, or the adaptation
of the whole to ulterior purposes; but here especially should we
insist on human affections, human loves, human sympathies. Here,
in the heart of man, his hopes, his regrets, his affections, do we find
the great source of the beautiful—tributaries which take their name
from the stream they join, but which often form the main current.
On that sympathy with which nature has so wonderfully endowed
us, which makes the pain and pleasure of all other living things our
own pain and pleasure, which binds us not only to our fellow-men,
but to every moving creature on the face of the earth, we should
have much to say. How much, for instance, does its life add to the
beauty of the swan!—how much more its calm and placid life! Here,
and on what would follow on the still more exalted mood of pious
contemplation—when all nature seems as a hymn or song of praise
to the Creator—we should be happy to borrow aid from Mr Ruskin;
his essay supplying admirable materials for certain chapters in a
treatise on the beautiful which should embrace the whole subject.
No such treatise, however, is it our object to compose. We have
said enough to show the true nature of that theory of association, as
a branch of which alone is it possible to take any intelligible view of
Mr Ruskin's Theoria, or "Theoretic Faculty." His flagrant error is, that
he will represent a part for the whole, and will distort and confuse
everything for the sake of this representation. Viewed in their proper
limitation, his remarks are often such as every wise and good man
will approve of. Here and there too, there are shrewd intimations
which the psychological student may profit by. He has pointed out
several instances where the associations insisted upon by writers of
the school of Alison have nothing whatever to do with the sentiment
of beauty; and neither harmonise with, nor exalt it. Not all that may,
in any way, interest us in an object, adds to its beauty. "Thus," as Mr
Ruskin we think very justly says, "where we are told that the leaves
of a plant are occupied in decomposing carbonic acid, and preparing
oxygen for us, we begin to look upon it with some such indifference
as upon a gasometer. It has become a machine; some of our sense
of its happiness is gone; its emanation of inherent life is no longer
pure." The knowledge of the anatomical structure of the limb is very
interesting, but it adds nothing to the beauty of its outline. Scientific
associations, however, of this kind, will have a different æsthetic
effect, according to the degree or the enthusiasm with which the
science has been studied.
It is not our business to advocate this theory of association of
ideas, but briefly to expound it. But we may remark that those who
adopt (as Mr Ruskin has done in one branch of his subject—his
Æsthesis) the rival theory of an intuitive perception of the beautiful,
must find a difficulty where to insert this intuitive perception. The
beauty of any one object is generally composed of several qualities
and accessories—to which of these are we to connect this intuition?
And if to the whole assemblage of them, then, as each of these
qualities has been shown by its own virtue to administer to the
general effect, we shall be explaining again by this new perception
what has been already explained. Select any notorious instance of
the beautiful—say the swan. How many qualities and accessories
immediately occur to us as intimately blended in our minds with the
form and white plumage of the bird! What were its arched neck and
mantling wings if it were not living? And how the calm and
inoffensive, and somewhat majestic life it leads, carries away our
sympathies! Added to which, the snow-white form of the swan is
imaged in clear waters, and is relieved by green foliage; and if the
bird makes the river more beautiful, the river, in return, reflects its
serenity and peacefulness upon the bird. Now all this we seem to
see as we look upon the swan. To which of these facts separately
will you attach this new intuition? And if you wait till all are
assembled, the bird is already beautiful.
We are all in the habit of reasoning on the beautiful, of defending
our own tastes, and this just in proportion as the beauty in question
is of a high order. And why do we do this? Because, just in
proportion as the beauty is of an elevated character, does it depend
on some moral association. Every argument of this kind will be found
to consist of an analysis of the sentiment. Nor is there anything
derogatory, as some have supposed, in this analysis of the
sentiment; for we learn from it, at every step, that in the same
degree as men become more refined, more humane, more kind,
equitable, and pious, will the visible world become more richly clad
with beauty. We see here an admirable arrangement, whereby the
external world grows in beauty, as men grow in goodness.
We must now follow Mr Ruskin a step farther into the
development of his Theoria. All beauty, he tell us, is such, in its high
and only true character, because it is a type of one or more of God's
attributes. This, as we have shown, is to represent one class of
associated thought as absorbing and displacing all the rest. We
protest against this egregious exaggeration of a great and sacred
source of our emotions. With Mr Ruskin's own piety we can have no
quarrel; but we enter a firm and calm protest against a falsification
of our human nature, in obedience to one sentiment, however
sublime. No good can come of it—no good, we mean, to religion
itself. It is substantially the same error, though assuming a very
different garb, which the Puritans committed. They disgusted men
with religion, by introducing it into every law and custom, and detail
of human life. Mr Ruskin would commit the same error in the
department of taste, over which he would rule so despotically: he is
not content that the highest beauty shall be religious; he will permit
nothing to be beautiful, except as it partakes of a religious character.
But there is a vast region lying between the "animal pleasantness" of
his Æsthesis and the pious contemplation of his Theoria. There is
much between the human animal and the saint; there are the
domestic affections and the love they spring from, and hopes, and
regrets, and aspirations, and the hour of peace and the hour of
repose—in short, there is human life. From all human life, as we
have seen, come contributions to the sentiment of the beautiful,
quite as distinctly traced as the peculiar class on which Mr Ruskin
insists.
If any one descanting upon music should affirm, that, in the first
place, there was a certain animal pleasantness in harmony or
melody, or both, but that the real essence of music, that by which it
truly becomes music, was the perception in harmony or melody of
types of the Divine attributes, he would reason exactly in the same
manner on music as Mr Ruskin does on beauty. Nevertheless,
although sacred music is the highest, it is very plain that there is
other music than the sacred, and that all songs are not hymns.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

ebookbell.com

You might also like