Full Arithmetic Optimization Techniques For Hardware and Software Design 1st Edition Ryan Kastner Ebook All Chapters

Full download ebook at ebookname.
com
Arithmetic Optimization Techniques for Hardware

and Software Design 1st Edition Ryan Kastner
https://ebookname.com/product/arithmetic-optimization-
techniques-for-hardware-and-software-design-1st-edition-
ryan-kastner/
OR CLICK BUTTON
DOWLOAD NOW
Download more ebook from https://ebookname.com

More products digital (pdf, epub, mobi) instant
download maybe you interests ...
Embedded System Design A Unified Hardware Software

Introduction Frank Vahid
https://ebookname.com/product/embedded-system-design-a-unified-
hardware-software-introduction-frank-vahid/
Computer Organization and Design The Hardware Software

Interface 3rd Edition David A. Patterson
https://ebookname.com/product/computer-organization-and-design-
the-hardware-software-interface-3rd-edition-david-a-patterson/
Computer Architecture Software Aspects Coding and

Hardware 1st Edition John Y. Hsu
https://ebookname.com/product/computer-architecture-software-
aspects-coding-and-hardware-1st-edition-john-y-hsu/
Raspberry Pi Cookbook Software and Hardware Problems

and Solutions 1st Edition Simon Monk
https://ebookname.com/product/raspberry-pi-cookbook-software-and-
hardware-problems-and-solutions-1st-edition-simon-monk/
BeagleBone Cookbook Software and Hardware Problems and
Solutions 1st Edition Mark A. Yoder
https://ebookname.com/product/beaglebone-cookbook-software-and-
hardware-problems-and-solutions-1st-edition-mark-a-yoder/
RTL Hardware Design Using VHDL Coding for Efficiency

Portability and Scalability 1st Edition Chu
https://ebookname.com/product/rtl-hardware-design-using-vhdl-
coding-for-efficiency-portability-and-scalability-1st-edition-
chu/
Embedded Controller Hardware Design 1st Edition Ken

Arnold
https://ebookname.com/product/embedded-controller-hardware-
design-1st-edition-ken-arnold/
Design for Motion Fundamentals and Techniques of Motion

Design Austin Shaw
https://ebookname.com/product/design-for-motion-fundamentals-and-
techniques-of-motion-design-austin-shaw/
Microcontrollers and Microcomputers Principles of

Software and Hardware Engineering 2nd Edition Frederick
M Cady
https://ebookname.com/product/microcontrollers-and-
microcomputers-principles-of-software-and-hardware-
engineering-2nd-edition-frederick-m-cady/
This page intentionally left blank
Arithmetic Optimization Techniques for Hardware and Software Design
Obtain better system performance, lower energy consumption, and avoid hand-
coding arithmetic functions with this concise guide to automated optimization
techniques for hardware and software design. High-level compiler optimizations
and high-speed architectures for implementing FIR filters are covered, which can
improve performance in communications, signal processing, computer graphics,
and cryptography. Clearly explained algorithms and illustrative examples through-
out make it easy to understand the techniques and write software for their imple-
mentation. Background information on the synthesis of arithmetic expressions and
computer arithmetic is also included, making the book ideal for new-comers to
the subject. This is an invaluable resource for researchers, professionals, and
graduate students working in system level design and automation, compilers, and
VLSI CAD.
Ryan Kastner is an Associate Professor in the Department of Computer Science

and Engineering at the University of California, San Diego. He received his
Ph.D. in Computer Science from UCLA in 2002 and has since published over
100 technical papers and three books. His current research interests are in embed-
ded system design, particularly the use of reconfigurable computing devices for
digital signal processing.
Anup Hosangadi is an R&D Engineer in the Emulation Group at Cadence Design

Systems, Inc. He received his Ph.D. in Computer Engineering from the University
of California, Santa Barbara, in 2006 and his research interests include high-level
synthesis, combinatorial optimization, and computer arithmetic.
Farzan Fallah is currently a visiting scholar at Stanford University, Stanford.

He received his Ph.D. in Electrical Engineering and Computer Science from MIT
in 1999, after which he worked as a Project Leader at Fujitsu Laboratories of
America in Sunnyvale until 2008. Farzan has published over 60 papers and has
20 patents granted or pending. He has received a Best Paper Award at the Design
Automation Conference in 1998 and a Best Paper Award at the VLSI Design
Conference in 2005. He is currently the cochair of the Low Power Technical
Committee of ACM SIGDA and an associate editor of the ACM Transactions on
Design Automation of Electronic Systems.
Arithmetic Optimization
Techniques for Hardware
and Software Design
RYAN KASTNER
University of California, San Diego
ANUP HOSANG ADI

Cadence Design Systems, Inc.
F AR Z A N F A L L AH
Stanford University
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo
Cambridge University Press

The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521880992
© Cambridge University Press 2010
This publication is in copyright. Subject to statutory exception and to the

provision of relevant collective licensing agreements, no reproduction of any part
may take place without the written permission of Cambridge University Press.
First published in print format 2010
ISBN-13 978-0-511-71299-9 eBook (NetLibrary)

ISBN-13 978-0-521-88099-2 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy

of urls for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
List of abbreviations page vii

Preface ix
1 Introduction 1
1.1 Overview 1
1.2 Salient features of this book 5
1.3 Organization 6
1.4 Target audience 7
2 Use of polynomial expressions and linear systems 9

2.1 Chapter overview 9
2.2 Approximation algorithms 9
2.3 Computer graphics 10
2.4 Digital signal processing (DSP) 12
2.5 Cryptography 16
2.6 Address calculation in data intensive applications 17
2.7 Summary 19
3 Software compilation 21
3.2 Basic software compiler structure 21
3.3 Algebraic transformations in optimizing software compilers 25
3.4 Summary 33
4 Hardware synthesis 35
4.2 Hardware synthesis design flow 35
4.3 System specification 38
4.4 Program representation 39
4.5 Algorithmic optimization 44
4.6 Resource allocation 45
4.7 Operation scheduling 49
vi Contents
4.8 Resource binding 56

4.9 Case study: FIR filter 58
4.10 Summary 63
5 Fundamentals of digital arithmetic 68

5.2 Basic number representation 68
5.3 Two-operand addition 75
5.4 Multiple-operand addition 82
5.5 Summary 93
6 Polynomial expressions 95
6.2 Polynomial expressions 95
6.3 Problem formulation 96
6.4 Related optimization techniques 96
6.5 Algebraic optimization of arithmetic expressions 99
6.6 Experimental results 113
6.7 Optimal solutions for reducing the number of operations
in arithmetic expressions 117
6.8 Summary 123
7 Linear systems 126

7.2 Linear system basics 126
7.3 Problem formulation 129
7.4 Single-constant multiplication (SCM) 130
7.5 Multiple-constant multiplication (MCM) 133
7.6 Overview of linear system optimizations 140
7.7 Transformation of a linear system into a polynomial expression 142
7.8 Optimization for synthesis using two-operand adders 143
7.9 FIR filter optimization 147
7.10 Synthesis for multiple-operand addition 158
7.11 Delay-aware optimization 164
7.12 Software optimization 174
7.13 Summary 178
Index 182
Abbreviations
ACU address calculation units

ADPCM adaptive differential pulse-code modulation
AE address expression
AEB available expression blocks
ALAP as late as possible
ALU arithmetic logic unit
ASAP as soon as possible
ASIC application specific integrated circuit
AST abstract syntax tree
BSC binary-stored carry
CAX concurrent arithmetic extraction
CDFG control data flow graph
CFG control flow graph
CIM cube intersection matrix
CLA carry look-ahead adder
CLB configurable logic block
CLG carry look-ahead generator
CNF conjunctive normal form
CPA carry propagate adder
CSA carry save adder
CSD canonical signed digits
CSE common subexpression elimination
DAG directed acyclic graph
DCT discrete cosine transform
DFG data flow graph
DFT discrete Fourier transform
DHT discrete Hartley transform
DSP digital signal processor/processing
DST discrete sine transform
DU define use
DWT discrete wavelet transform
FDS force directed scheduling
FF flip flop
viii List of abbreviations
FFT finite Fourier transform

FIR finite impulse response
FPGA field programmable gate array
FX fast-extract
HDL hardware description language
IDCT inverse discrete cosine transform
IIR infinite impulse response
ILP integer linear programming
IMDCT inverse modified discrete cosine transform
IP intellectual property
KCM kernel-cube matrix
LUT look up table
MAC multiply accumulate
MCM multiple-constant multiplication
MDCT modified discrete cosine transform
MSPS million samples per second
NP non-deterministic polynomial
PDA parallel distributed arithmetic
PDG program dependence graph
PRE partial redundancy elimination
RCA ripple carry adder
RCS resource constrained scheduling
RDFT real discrete Fourier transform
RTL register transfer level
SA simulated annealing
SOP sum of products
SSA static single assignment
TCS timing constrained scheduling
THR tree height reduction
WHT Walsh–Hadamard transform
Preface
The purpose of the book is to provide an understanding of the design methodologies

and optimizations behind the hardware synthesis and software optimization of
arithmetic functions. While we have provided a discussion on the fundamentals
of software compilation, hardware synthesis and digital arithmetic, much of the
material focuses on the implementation of linear systems and polynomial functions
in hardware or software. It is therefore intended for students and researchers with a
solid background in computing systems.
The three of us started looking into this topic in 2003, back when Anup was a
graduate student of Ryan’s at the University of California, Santa Barbara; Farzan
was a researcher at Fujitsu at that time. Our initial research focused on the
optimizations of polynomial functions, attempting to understand the optimal
method for synthesizing these as a digital circuit. While developing the model
for this synthesis process, we noticed that this same model could be used to
describe linear systems. Therefore, the algorithms we developed to optimize
polynomials were applicable to linear systems as well. Around this time we
received additional funding from the UC Discovery grant and Fujitsu to further
study these optimizations. The resulting research was published as numerous
journal, conference and workshop papers, and was a basis for Anup’s Ph. D.
thesis. We noticed the significant lack of published material in this topic area
and approached Cambridge University Press with the idea of writing a book on
the topic; the obvious result of this lies in the following pages.
This book would not be possible without the help of many people. We would like
to thank Fujitsu Laboratories of America and the UC Discovery Grant Program
for their support of our research over the years. We would like to thank Cambridge
University Press for the help in developing this book. We are particularly thankful
for the editors’ immense amount of patience. Ryan would further like to acknow-
ledge support from the National Science Foundation Grants CNS-0839944 and
CNS-0411321.
1 Introduction
1.1 Overview
Arithmetic is one of the old topics in computing. It dates back to the many early
civilizations that used the abacus to perform arithmetic operations. The seventeenth
and eighteenth centuries brought many advances with the invention of mechanical
counting machines like the slide rule, Schickard’s Calculating Clock, Leibniz’s
Stepped Reckoner, the Pascaline, and Babbage’s Difference and Analytical Engines.
The vacuum tube computers of the early twentieth century were the first program-
mable, digital, electronic, computing devices. The introduction of the integrated
circuit in the 1950s heralded the present era where the complexity of computing
resources is growing exponentially. Today’s computers perform extremely advanced
operations such as wireless communication and audio, image, and video processing,
and are capable of performing over 1015 operations per second.
Owing to the fact that computer arithmetic is a well-studied field, it should
come as no surprise that there are many books on the various subtopics of
computer arithmetic. This book provides a focused view on the optimization of
polynomial functions and linear systems. The book discusses optimizations that
are applicable to both software and hardware design flows; e.g., it describes the
best way to implement arithmetic operations when your target computational
device is a digital signal processor (DSP), a field programmable gate array
(FPGA) or an application specific integrated circuit (ASIC).
Polynomials are among the most important functions in mathematics and are
used in algebraic number theory, geometry, and applied analysis. Polynomial
functions appear in applications ranging from basic chemistry and physics to
economics, and are used in calculus and numerical analysis to approximate other
functions. Furthermore, they are used to construct polynomial rings, a powerful
concept in algebra and algebraic geometry.
One of the most important computational uses of polynomials is function
evaluation, which lies at the core of many computationally intensive applications.
Elementary functions such as sin, cos, tan, sin1, cos1, sinh, cosh, tanh, exponen-
tiation and logarithm are often approximated using a polynomial function.
Producing an approximation of a function with the required accuracy in a rather
large interval may require a polynomial of a large degree. For instance, approximating
the function ln(1 þ x) in the range [1=2, 1=2] with an error less than 10–8
2 Introduction
requires a polynomial of degree 12. This requires a significant amount of

computation, which without careful optimization results in unacceptable runtime.
Linear systems also play an important role in mathematics and are prevalent in a
wide range of applications. A linear system is a mathematical model based on
linear operators. Linear systems typically exhibit features and properties that are
much simpler and easier to understand and manipulate than the more general,
nonlinear case. They are used for mathematical modeling or abstraction in
automatic control theory, signal processing, and telecommunications.
Perhaps the foremost computational use of linear systems is in signal process-
ing. A typical signal processing algorithm takes as input a signal or a set of signals
and outputs a transformation of them that highlights specific aspects of the data
set. For example, the Fourier transform takes as the input the value of a signal
over time and returns the corresponding signal transformed into the frequency
domain. Such linear transforms are prevalent in almost any form of DSP and
include the aforementioned discrete Fourier transform (DFT), as well as the
discrete cosine transform (DCT), finite impulse response (FIR) filters, and discrete
wavelet transform (DWT).
Polynomials and linear systems lie at the heart of many of the computer
intensive tasks in real-time systems. For example, radio frequency communication
transceivers, image and video compression, and speech recognition engines all
have tight constraints on the time period within which they must compute a
function; the processing of each input, whether it be an electromagnetic sample
from the antenna, a pixel from a camera or an acoustic sample from a micro-
phone, must be performed within a fixed amount of time in order to keep up
with the application’s demand. Therefore, the processing time directly limits the
real-time behavior of the system.
The bulk of the computation in these applications is performed by mathemat-
ical functions. These functions include many of the aforementioned elementary
functions (sin, cos, tan, exponentiation, and logarithm) as well as linear trans-
forms (DFT, DCT, DWT, and FIR filters). Application developers often rely
on hand-tuned hardware and software libraries to implement these functions.
As these are typically a bottleneck in the overall execution of the application,
the sooner they finish, the faster the applications run. However, small changes
in the parameters of the function (e.g., moving from 16-bit to 32-bit data,
changing the coefficients of a filter, adding more precision to the linear transform)
require significant redesign of the library elements, perhaps even starting from
scratch if the library does not support the exact specification that is required.
Further, as the underlying computing platform changes, the libraries should
ideally be ported to the new platform with minimal cost and turnaround time.
Finally, designers require the ability to tradeoff between different performance
metrics including speed, accuracy, and resource usage (i.e., silicon area for hard-
ware implementation and the number of functional units and the amount of
memory for software implementations). Therefore, methods to ease the design
space exploration over these points are invaluable.
1.1 Overview 3
Many of the applications that we consider lie in the realm of embedded

computing. These are nontraditional computing systems where the processor is a
component of a larger system. Unlike desktops, laptops, and servers, embedded
systems are not thought of as primarily computing devices. Example applications
include anti-lock braking systems and navigation controls in automobiles, the Mars
Rover and robotic surgical systems (along with many other robotics applications),
smart phones, MP3 players, televisions, digital video recorders and cameras;
these are just some of the devices that can be classified as embedded systems.
There has been an explosive growth in the market for embedded systems
primarily in the consumer electronics segment. The increasing trend towards high
performance and portable systems has forced researchers to come up with innova-
tive techniques and tools that can achieve these objectives and meet the strict time
to market requirements. Most of these consumer applications, such as smart
phones, digital cameras, portable music and video players, perform some kind
of continuous numerical processing; they constantly process input data and
perform extensive calculations. In many cases, these calculations determine the
performance and size of the system implemented. Furthermore, since these calcu-
lations are energy intensive, they are the major factors determining the battery life
of the portable applications.
Embedded system designers face a plethora of decisions. They must attempt to
delicately balance a number of often conflicting variables, which include cost,
performance, energy consumption, size, and time to market. They are faced with
many questions; one of the most important is the choice of the computational
device. Microprocessors, microcontrollers, DSPs, FPGAs and ASICs are all
appropriate choices depending on the situation, and each has its benefits
and drawbacks. Some are relatively easy to program (microprocessors, microcon-
trollers, DSPs), while others (ASICs, FPGAs) provide better performance
and energy consumption. The first three choices require a software design flow,
while the last two (ASICs and FPGAs) require hardware design tools. Increas-
ingly, computing devices are “system-on-chip” and consist of several of the
aforementioned computational devices. For example, cell phones contain a mix
of DSPs, microcontrollers, and ASICs – all on the same physical silicon die. This
necessitates a mixed hardware/software design flow, which we discuss in more
detail in the following.
Figure 1.1 illustrates a typical design flow for computationally intensive embed-
ded system applications. The application is described using a specification lan-
guage that expresses the functional requirements in a systematic manner.
Additionally, the designer provides constraints, which include the available
resources, timing requirements, error tolerance, maximum area, power consump-
tion. The application specification is then analyzed and an appropriate algorithm
is selected to implement the desired functionality. For example, signal processing
applications must choose the appropriate transforms. Computer graphics
applications must select the polynomial models for the surfaces, curves, and
textures. An important step is the conversion of floating point representation
4 Introduction
Computational analysis
Algorithmic Fixed to floating Error

optimization point conversion analysis
System
specification Hardware/Software
partitioning
Architectural Compiler frontend:

synthesis lexical analysis,
syntactic analysis,
Hardware design flow
Software design flow

semantic checking
Logic
synthesis
Compiler backend:
analysis,
Physical optimization,
synthesis code generation
Embedded system
Register transfer level
description
Figure 1.1 Embedded system design flow.
to fixed point representation. Though floating point representation provides

greater dynamic range and precision than fixed point, it is far more expensive to
compute. Most embedded system applications tolerate a certain degree of inaccur-
acy and use the much simpler fixed point notation to increase throughput and
decrease area, delay, and energy. The conversion of floating point to fixed point
produces some errors [1]. These errors should be carefully analyzed to see if they
reside within tolerable limits [2, 3].
At this point, the application is roughly divided between hardware and soft-
ware. The designer, perhaps with the help of automated tools, determines the
parts of the system specification that should be mapped onto hardware compon-
ents and the parts that should be mapped to software. For real-time applications
with tight timing constraints, the computation intensive kernels are often imple-
mented in hardware, while the parts of the specification with looser timing
constraints are implemented in software. After this decision, the architecture of
the system and the memory hierarchy are decided. The custom hardware por-
tions of the system are then designed by means of a behavioral description of the
algorithm using a hardware description language (HDL). Hardware synthesis
tools transform these hardware descriptions into a register transfer level (RTL)
language description by means of powerful hardware synthesis tools. These
synthesis tools mainly perform scheduling, resource allocation, and binding of
the various operations obtained from an intermediate representation of the
1.2 Salient features of this book 5
behavior represented in the HDL [4]. In addition the tools perform optimizations
such as redundancy elimination (common subexpression elimination (CSE) and
value numbering) and critical path minimization. The constant multiplications in
the linear systems and polynomials can be decomposed into shifts and additions
and the resulting complexity can be further reduced by eliminating common
subexpressions [5–8]. Furthermore, there are some numeric transformations of
the constant coefficients that can be applied to linear transforms to reduce the
strength of the operations [9, 10]. This book provides an in-depth discussion of
such transforms. The order and priorities of the various optimizations and
transformations are largely application dependent and are the subject of current
research. In most cases, this is done by evaluating a number of transformations
and selecting the one that best meets the constraints [11]. The RTL description is
then synthesized into a gate level netlist, which is subsequently placed and routed
using standard physical design tools.
For the software portion of the design, custom instructions tuned to the
particular application may be added [12–14]. Certain computation intensive
kernels of the application may require platform dependent software in order to
achieve the best performance on the available architecture. This is often done
manually by selecting the relevant functions from optimized software libraries.
For some domains, including signal processing applications, automatic library
generators are available [11]. The software is then compiled using various trans-
formations and optimization techniques [15]. Unfortunately, these compiler
optimizations perform limited transformations for reducing the complexity of
polynomial expressions and linear systems. For some applications, the generated
assembly code is optimized (mostly manually) to improve performance, though it
is not practical for large and complex programs. An assembler and a linker are
then used to generate the executable code.
Opportunities for optimizing polynomial expressions and linear systems exist
for both the hardware and the software implementations. These optimizations
have the potential for huge impact on the performance and power consumption
of the embedded systems. This book presents techniques and algorithms for
performing such optimizations during both the hardware design flow and the
software compilation.
1.2 Salient features of this book
The unique feature of this book is its treatment of the hardware synthesis and
software compilation of arithmetic expressions. It is the first book to discuss
automated optimization techniques for arithmetic expressions. The previous
literature on this topic, e.g., [16] and [17], deals only with the details of implement-
ing arithmetic intensive functions, but stops short of discussing techniques to
optimize them for different target architectures. The book gives a detailed intro-
duction to the kind of arithmetic expressions that occur in real-life applications,
6 Introduction
such as signal processing and computer graphics. It shows the reader the import-
ance of optimizing arithmetic expressions to meet performance and resource
constraints and improve the quality of silicon. The book describes in detail the
different techniques for performing hardware and software optimizations. It also
describes how these techniques can be tuned to improve different parameters such
as the performance, power consumption, and area of the synthesized hardware.
Though most of the algorithms described in it are heuristics, the book also shows
how optimal solutions to these problems can be modeled using integer linear
programming (ILP). The usefulness of these techniques is then verified by applying
them on real benchmarks.
In short, this book gives a comprehensive overview of an important problem
in the design and optimization of arithmetic intensive embedded systems.
It describes in detail the state of the art techniques that have been developed to
solve this problem. This book does not go into detail about the mathematics
behind the arithmetic expressions. It assumes that system designers have per-
formed an analysis of the system and have come up with a set of polynomial
equations that describe the functionality of the system, within an acceptable error.
Furthermore, it assumes that the system designer has decided what is the best
architecture (software, ASIC or FPGA or a combination of them) to implement
the arithmetic function. The book does not talk about techniques to verify the
precision of the optimized arithmetic expressions. Techniques such as those dis-
cussed in [2] and [18] can be used to verify if the expressions produce errors within
acceptable limits.
1.3 Organization
Chapter 2 illustrates the different applications that require arithmetic computa-

tion. It shows how polynomial expressions and linear computations reside in a
number of applications that drive embedded systems and high-performance
computing markets. The chapter discusses how polynomials are employed in
computer graphics applications and describes the use of linear systems in DSP,
cryptography, and address calculation.
Chapter 3 presents an overview of the software compilation process and shows
opportunities to optimize linear systems and polynomial expressions.
Chapter 4 provides a high-level description of the hardware synthesis design
flow. It explains the major steps in this design flow including input specification,
algorithm optimization, scheduling, binding, and resource allocation. The chapter
illustrates these concepts with a case study of an FIR filter.
Chapter 5 gives a brief introduction to the concepts in digital arithmetic.
It explains number representations including fixed and floating point represen-
tations. Also, it presents different architectures to perform two-operand and
multiple-operand addition. These concepts are important in order to gain an
understanding of the optimizations described in Chapters 6 and 7.
1.4 Target audience 7
Chapter 6 presents algebraic optimization techniques for polynomial expres-

sions. It describes representations of polynomial expressions as well as various
algorithms to optimize polynomials for both hardware and software implemen-
tation. The chapter concludes with experimental results showing the relative
benefits of the various optimization techniques.
Chapter 7 describes algebraic techniques for the optimization of linear arithmetic
computations such as FIR filters and other linear transforms. Algorithms to
optimize multiple-operand addition are also presented. Finally, the chapter pre-
sents experimental results where the usefulness of these techniques is demonstrated
using real-life examples.
1.4 Target audience
When writing this book we had several audiences in mind. Much of the material
is targeted towards specialists, whether they be researchers in academia or
industry, who are designing both software and hardware for polynomial expres-
sions and/or linear systems. The book also provides substantial background of
the state of the art algorithms for the implementation of these systems, and
serves as a reference for researchers in these areas. This book is designed to
accommodate readers with different backgrounds, and the book includes some
basic introductory material on several topics including computer arithmetic,
software compilation, and hardware synthesis. These introductory chapters give
just enough background to demonstrate basic ideas and provide references to
gain more in-depth information. Most of the book can be understood by anyone
with a basic grounding in computer engineering. The book is suitable for
graduate students, either as a reference or as textbook for a specialized class
on the topics of hardware synthesis and software compilation for linear systems
and polynomial expressions. It is also suitable for an advanced topics class for
undergraduate students.
References
[1] C. Shi and R.W. Brodersen, An automated floating-point to fixed-point conversion

methodology, IEEE International Conference on Acoustics, Speech, and Signal Processing,
2003. Washington, DC: IEEE Computer Society, 2003.
[2] C.F. Fang, R.A. Rutenbar, and T. Chen, Fast, accurate static analysis for fixed-
point finite-precision effects in DSP designs, International Conference on Computer Aided
Design (ICCAD), San Jose, 2003. Washington, DC: IEEE Computer Society, 2003.
[3] D. Menard and O. Sentieys, Automatic evaluation of the accuracy of fixed-point
algorithms, Design, Automation and Test in Europe Conference and Exhibition, 2002.
Washington, DC: IEEE Computer Society, 2002.
8 Introduction
[4] G.D. Micheli, Synthesis and optimization of digital circuits, New York, NY:
McGraw-Hill, 1994.
[5] M. Potkonjak, M.B. Srivastava, and A.P. Chandrakasan, Multiple constant
multiplications: efficient and versatile framework and algorithms for exploring
common subexpression elimination, IEEE Transactions on Computer Aided Design
of Integrated Circuits and Systems, 15(2), 151–65, 1996.
[6] R. Pasko, P. Schaumont, V. Derudder, V. Vernalde, and D. Durackova, A new
algorithm for elimination of common subexpressions, IEEE Transactions on Computer
Aided Design of Integrated Circuits and Systems, 18(1), 58–68, 1999.
[7] R. Pasko, P. Schaumont, V. Derudder, and D. Durackova, Optimization method for
broadband modem FIR filter design using common subexpression elimination,
International Symposium on System Synthesis, 1997. Washington, DC: IEEE Computer
Society, 1997.
[8] A. Hosangadi, F. Fallah, and R. Kastner, Common subexpression elimination
involving multiple variables for linear DSP synthesis, IEEE International Conference on
Application-Specific Architectures and Processors, 2004. Washington, DC: IEEE
Computer Society, 2004.
[9] A. Chatterjee, R.K. Roy, and M.A. D’Abreu, Greedy hardware optimization
for linear digital circuits using number splitting and refactorization, IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, 1(4), 423–31, 1993.
[10] H.T. Nguyen and A. Chatterjee, Number-splitting with shift-and-add decomposition
for power and hardware optimization in linear DSP synthesis, IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, 8, 419–24, 2000.
[11] M. Puschel, B. Singer, J. Xiong, et al., SPIRAL: a generator for platform-adapted
libraries of signal processing algorithms, Journal of High Performance Computing and
Applications, 18, 21–45, 2004.
[12] R. Kastner, S. Ogrenci-Memik, E. Bozorgzadeh, and M. Sarrafzadeh, Instruction
generation for hybrid reconfigurable systems, International Conference on Computer
Aided Design. New York, NY: ACM, 2001.
[13] A. Peymandoust, L. Pozzi, P. Ienne, and G. De Micheli, Automatic instruction set
extension and utilization for embedded processors, IEEE International Conference on
Application-Specific Systems, Architectures, and Processors, 2003. Washington, DC:
IEEE Computer Society, 2003.
[14] Tensilica Inc., http://www.tensilica.com.
[15] S.S. Muchnick, Advanced Compiler Design and Implementation, San Francisco, CA:
Morgan Kaufmann Publishers, 1997.
[16] J.P. Deschamps, G.J.A. Bioul, and G.D. Sutter, Synthesis of Arithmetic Circuits:
FPGA, ASIC and Embedded Systems, New York, NY: Wiley-Interscience (2006).
[17] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays,
third edition. Springer, 2007.
[18] C. Fang Fang, R.A. Rutenbar, M. Puschel, and T. Chen, Toward efficient static
analysis of Finite-Precision effects in DSP applications via affine arithmetic modeling,
Design Automation Conference. New York, NY: ACM, 2003.
2 Use of polynomial expressions
and linear systems
2.1 Chapter overview
Polynomial expressions and linear systems are found in a wide range of applica-
tions: perhaps most fundamentally, Taylor’s theorem states that any differentiable
function can be approximated by a polynomial. Polynomial approximations are
used extensively in computer graphics to model geometric objects. Many of the
fundamental digital signal processing transformations are modeled as linear
systems, including FIR filters, DCT and H.264 video compression. Cryptographic
systems, in particular, those that perform exponentiation during public key
encryption, are amenable to modeling using polynomial expressions. Finally,
address calculation during data intensive applications requires a number of add
and multiply operations that grows larger as the size and dimension of the array
increases. This chapter describes these and other applications that require arith-
metic computation. We show that polynomial expressions and linear systems are
found in a variety of applications that are driving the embedded systems and high-
performance computing markets.
2.2 Approximation algorithms
Polynomial functions can be used to approximate any differentiable function.

Given a set of points, the unisolvence theorem states that there always exists a
unique polynomial, which precisely models these points. This is extremely useful
for computing complex functions such as logarithm and trigonometric functions
and forms the basis for algorithms in numerical quadrature and numerical ordin-
ary differential equations. More precisely, the unisolvence theorem states that,
given a set of n þ 1 unique data points, a unique polynomial with degree n or less
exists.
As an example, consider the Taylor expansion of sin (x) approximated to four terms:
x3 x5 x7
sinðxÞ ¼ x þ : ð2:1Þ
3! 5! 7!
This is a polynomial of degree 7 that approximates the sine function. Assuming
that the terms 1/3!, 1/5!, and 1/7! are precomputed (these will be denoted as S3, S5,
10 Use of polynomial expressions and linear systems
and S7, respectively), the naı̈ve evaluation of this polynomial representation

requires 3 additions/subtractions, 12 variable multiplications, and 3 constant
multiplications. However, it is possible to optimize this polynomial to reduce the
number of operations needed for its computation. For example, the techniques
described in this book produce the following set of equations, which are equiva-
lent to the four-term Taylor expansion of sin (x):
d1 ¼ x x,
d 2 ¼ S5 S 7 d1 ,
d3 ¼ d 2 d 1 S 3 ,
d4 ¼ d3 d1 þ 1,
sinðxÞ ¼ x d4 :
Here, only three additions/subtractions, four variable multiplications, and one

constant multiplication are needed.
It is noteworthy that computing these expressions, even in their optimized form,
is expensive in terms of hardware, cycle time, and power consumption. If the
arguments to these functions are known beforehand, the functions can be pre-
computed and stored in lookup tables in memory. However, in cases where these
arguments are not known or the memory size is limited, these expressions must be
computed during the execution of the application that uses them.
2.3 Computer graphics
Computer graphics is a prime example of an application domain that uses

polynomials to approximate complex functions. The use of computer graphics is
widespread and includes applications such as video games, animated movies, and
scientific modeling. In general, these applications are computationally expensive.
Advanced graphics is increasingly being integrated into embedded devices due
to the consumer demand and improvements in technology. Therefore, techniques
that optimize computation time, power, energy, and throughput for graphics
applications are of utmost importance.
Polynomials are the fundamental model for approximating arcs, surfaces,
curves, and textures. In fact, most geometric objects are formulated in terms of
polynomial equations, thereby reducing many graphic problems to the manipula-
tion of polynomial systems [1]. Therefore, solving polynomial systems is an
elementary problem in many geometric computations. As an example, consider
the process of spline interpolation, which is used to model textures and surfaces.
A spline is a method of approximation, in which a function is divided piecewise
into a set of polynomials, i.e., each piece of the function is approximated using
a polynomial. More formally, given a set of n þ 1 distinct points, a k-spline
function is a set of n polynomial functions with degree less than or equal to k.
This interpolation allows each polynomial to have a low degree, as opposed to
2.3 Computer graphics 11
having one high-degree polynomial approximation for the entire curve. In general,
the spline interpolation yields similar accuracy to modeling the same curve using one
higher-degree polynomial. Therefore, splines are less computationally complex.
Consider a quartic spline – a spline where the polynomials have degree less than
or equal to 4, i.e., k ¼ 4. A quartic spline is smooth in both first and second
derivatives and continuous in the third derivative. The unoptimized polynomial
expression representing a quartic spline is
P ¼ zu4 þ 4avu3 þ 6bu2 v2 þ 4uv3 w þ qv4 : ð2:2Þ

The straightforward implementation requires 23 multiplications and 4 additions.
The number of operations can be reduced using two-term common subexpres-
sion elimination (CSE), a common software compiler technique used in most
optimizing compilers [2]. CSE is an iterative algorithm where two-term common
subexpressions are extracted and eliminated. The result of the two-term CSE
algorithm depends on the ordering of the operations in the parse tree of the
expression. The results using the ordering that produces the least number of
operations for the CSE algorithm are
d1 ¼ u2 ; d2 H ¼ v2 ; d3 ¼ uv;
ð2:3Þ
P ¼ d12 z þ 4ad1 d3 þ 6bd1 d2 þ 4wd2 d3 þ qd22 :
Note that three two-term common subexpressions were extracted: u2, v2, and uv.
This form requires 16 multiplications and 4 additions, reducing the number of
multiplications by seven from the straightforward implementation. Alternatively,
the Horner form can be used to reduce the number of operations. The Horner
form is a way of efficiently computing a polynomial by viewing it as a linear
combination of monomials. In the Horner form, a polynomial is converted into a
nested sequence of multiplications and additions, which is very efficient to com-
pute with multiply accumulate (MAC) operations; it is a popular form for evalu-
ating many polynomials in signal processing libraries including the GNU C library [3].
The Horner form of the quartic spline polynomial is
P ¼ zu4 þ v ð4au3 þ v ð6bu2 þ v ð4uw þ qvÞÞÞ: ð2:4Þ
It uses 17 multiplications and 4 additions. The Horner form produces a good

representation for a univariate polynomial, e.g., the Taylor series form for sine,
but it often performs poorly on multivariate polynomials such as the quartic spline
example. It is not always clear which of the variables should be factored in the
Horner form to get the best result.
The quartic spline polynomial optimized using the algebraic techniques
discussed in this book is
d1 ¼ v2 ; d2 ¼ 4v;
ð2:5Þ
P ¼ u3 ðuz þ ad2 Þ þ d1 ðqd1 þ u ðwd2 þ 6buÞÞ:
It uses 13 multiplications and 4 additions. This implementation produces three

fewer multiplications than the two-term CSE algorithm and four fewer multipli-
cations than the Horner form. The algebraic techniques presented later in this
book can optimize more than one expression at a time and can optimize expres-
sions consisting of any number of variables. These algorithms use a canonical
form to represent the arithmetic expressions and consider the detection of any
common subexpression or algebraic factorization.
2.4 Digital signal processing (DSP)
DSP is an increasingly important form of computation found in applications

including audio, image, video, speech, and/or communications. Therefore, most
of the techniques described in this book are focused on the general area of signal
processing. In this chapter, we focus on two of the more commonly used functions –
FIR filters and linear transforms. These functions are ubiquitous in signal
processing applications. There are many other uses of optimizations for DSP,
which will be discussed in this book.
2.4.1 Finite impulse response (FIR) filters

Filtering is perhaps the most common computation in signal processing. Filters
are often major determinants of the performance of the entire system. Therefore,
it is important to have good tools for optimizing these functions.
The FIR filter is a common form of digital filter. It is preferred to other
alternatives, e.g., infinite impulse response (IIR), due to its lack of feedback and
phase distortion. Furthermore, FIR filters are relatively simple to implement;
most digital signal processors have specialized functional units (e.g., MAC) for
efficient FIR filter computation.
An L-tap FIR filter involves a convolution of the L most recent input samples
with a set of constants. This is typically denoted as:
X
y½n ¼ h½k x½n k; k ¼ 0; 1; . . . ; L 1: ð2:6Þ
The output value y[n] is computed by multiplying the L most recent input
samples from the input vector x by a set of constant coefficients stored in the h
vector, where |h| ¼ L. Equivalently, h[k] represents the kth constant coefficient of
the filter, x[n] represents the input time series, and y[n] is the output time series.
The constants vary depending on the type of filter (e.g., low-pass, high-pass,
Butterworth).
There are many different implementations for an FIR filter. The conventional
tapped delay-line realization of this inner product is shown in Figure 2.1. The com-
putation of each output sample consists of L constant multiplications and L 1
x[n]
h[L – 1] h[L – 2] h[L – 3] h[1] h[0]
X X X . . . X X
+ + . . . + + y[n]
Figure 2.1 The tapped delay line representation of an FIR filter with L taps. Each output
sample requires L constant multiplications and L 1 additions.
additions. A fully parallel implementation uses O(L) constant multipliers, O(L)

adders, and O(L) registers. As one can see, this requires a substantial amount of
resources, but results in a high throughput. A serial version of the FIR filter can be
realized using a single MAC block. This reduces both the resource requirement
and the throughput by O(L).
The methods described in this book can be used to optimize the FIR filter
computation. In particular, we describe a number of optimizations, including the
aforementioned tapped delay line, distributed arithmetic, and arithmetic optimi-
zations. We compare various implementations and show several new methods
that outperform the conventional ones.
2.4.2 Linear transforms

A linear transform is a set of linear functions that define a relationship between
input and output variables. A linear transform is often preferable to nonlinear
transforms due to its simplistic model and subsequent relative ease of computa-
tion. Linear transforms are used in a range of applications including control
theory and signal processing. While we focus primarily on the latter in this section
of the book, the presented techniques are generally applicable for the optimization
of any linear system.
In general, a linear system can be expressed as a multiplication of a constant
matrix C with a vector of input samples X, where each output signal Y[i] is of the form
X
N1
Y½i ¼ Ci; j X½ j: ð2:7Þ
j¼0
As an example, consider the DCT [4], which is commonly used for compression
in many signal processing systems. For example, the DCT is used in both JPEG
and MPEG compression [5]. The DCT expresses data as a sum of sinusoids, in a
cos(0) cos(0) cos(0) cos(0) A A A A
cos(p/8) cos(3p/8) cos(5p/8) cos(7p/8) B C –C –B

C= =
cos(p/4) cos(3p/4) cos(5p/4) cos(7p/4) D –D –D D
cos(3p/8) cos(7p/8) cos(p/8) cos(5p/8) C –B B –C
Figure 2.2 The constant matrix for a four-point DCT. The matrix on the right provides
a simple substitution of the variables A, B, C, and D for the constants cos (0), cos (p/8),
cos (3p/8), and cos (p/4), respectively.
(a) (b)
y0 A A A A x0 y0 = Ax0 + Ax1 + Ax2 + Ax3
y1 B C –C –B x1 y1 = Bx0 + Cx1 – Cx2 – Bx3

=
y2 D –D –D D x2 y2 = Dx0 – Dx1 – Dx2 + Dx3
y3 C –B B –C x3 y3 = Cx0 – Bx1 + Bx2 – Cx3
Figure 2.3 (a) A four-point DCT represented as a multiplication of input vector with
a constant matrix and (b) the corresponding set of equations.
similar manner to Fourier transform. In fact, it is a special case of the DFT [6],
but uses only real numbers (corresponding to the cosine values of complex
exponentials, hence, the name). DCT has a strong energy compaction, which
is ideal for compression of image and video data, where most of the signal infor-
mation is found in the lower-frequency components.
DCT can be modeled according to Equation (2.7), where the constant matrix
(C) and a vector of input samples (X) are multiplied to compute the output vector Y.
The constant matrix for a four-point DCT is shown in Figure 2.2. The matrix
multiplication with a vector of input samples is shown in Figure 2.3. In the figures,
A, B, C and D can be viewed as distinct constants. The straightforward computation
of this matrix multiplication requires 16 multiplications and 12 additions/subtrac-
tions. In general O(N2) operations are required for an N-point DCT. However,
by extracting common factors, these expressions can be rewritten as shown in
Figure 2.4. This implementation is cheaper than the original implementation by
ten multiplications and four additions/subtractions. In general, factorization of
DCT equations can reduce the number of operations to O(N log N). However, these
optimizations are typically done manually in hand-coded signal processing libraries;
the methods discussed in this book can extract these common factors and common
subexpressions automatically.
The 4 4 linear integer transform used in H.264 [5] is another example of
a linear system found in signal processing. It is a digital video codec that achieves
a very high data compression rate. The integer transform for the video encoding
d1 = x0 + x3
d2 = x1 + x2
d3 = x1 – x2
d4 = x0 – x3
y0 = A × (d1 + d2)
y 1 = B × d 4 + C × d3
y2 = D × (d1 – d2)
y3 = C × d4 – B × d3
Figure 2.4 The four-point DCT after using techniques to eliminate common subexpressions.
and the translation of the constant values 1 and 2 to A and B, respectively, is

shown in
2 3 2 32 3 2 32 3
Y0 1 1 1 1 X0 A A A A X0
6 Y1 7 6 2 1 1 2 76 X 1 7 6 B A A B 7 6 X1 7
6 7¼6 76 7 ¼ 6 76 7 ð2:8Þ
4 Y2 5 4 1 1 1 1 54 X2 5 4 A A A A 5 4 X2 5
Y3 1 2 2 1 X3 A B B A X3
The first thing to note about the above transform is its similarity to the DCT
transform. Careful inspection of the matrices shows that this instance of the linear
transform is a special case of the four-point DCT.1 Using the optimization techniques
on the above transform gives the same set of extracted subexpressions as DCT after
substituting the H.264’s constant value A for the DCT constant values C and D
(see Figure 2.2). Another notable difference is the simplicity of the constant values
in the 4 4 linear integer transform of H.264. In fact, we can exploit this using strength
reduction to optimize the final implementation. Multiplication by 1 and 1 is
obviously trivial; similarly multiplication by constant value 2 is simply a left shift by
one bit.
Many DSP algorithms contain a large number of multiplications with constants
[7]. Careful decomposition of constant multiplications into shifts and additions
leads to tremendous benefits with respect to execution time, area, throughput, and
power/energy. Arithmetic optimizations can handle and exploit constant multipli-
cations as a set of shifts and additions. This is particularly beneficial for hardware
optimization. We will discuss this issue in further detail in the following. First,
consider Figure 2.5, which shows optimizations on the H.264 example [8]. The
number of additions is 8, which is less than the 12 required in the naı̈ve method.
There are several things to notice about this figure. First, the “” symbols denote
negation of the number. These negations could just as easily have been denoted as
a subtraction on the subsequent addition operation. The “1” indicates a left shift
by one bit. This is a result of the strength reduction of the constant multiplication
by 2. Any multiplication by a constant value can be substituted by a set of shifts and
1
Let DCT A ¼ H.264 A, DCT B ¼ H.264 B, DCT C ¼ H.264 A, DCT D ¼ H.264 A.
X0 + + Y0
X1 + Y2
– +
X2 Y1
– + +
–<<1
<<1
X3 – + + Y3
Figure 2.5 H.264 integer transform after extracting common subexpressions and applying
strength reduction on the constant multiplications.
additions. Therefore, the constant multiplications in the DCT example could be

similarly translated into a set of shift and add operations, albeit a much more
complicated one than the rather simple H.264 example.
Multipliers are expensive in terms of area, delay, and power/energy consum-
ption. When considering hardware implementations, the full flexibility of a
general-purpose multiplier is not required in some cases. For example, the
constant multiplication can be implemented using a sequence of additions and
shift operations. In a hardware implementation, the shift operation is “free” as
it involves simple rewiring. Therefore, the complexity of the constant multipli-
cation is directly related to the number of additions. There are many techniques
that convert a constant multiplication into a set of shift and add operations.
Most techniques revolve around the use of signed digit representation [9] for the
constant value and attempt to extract common bit patterns within a single
constant value. The techniques presented in this book are also capable of
such optimizations. Furthermore, the techniques presented in this book can
optimize the shift and add conversion among groups of constant values. For
example, we may find similar bit patterns in constants C and D in the DCT
example.
2.5 Cryptography
Cryptography is the study of hiding information from an adversary that wishes to

steal the information. It is a fundamental concern in computer and network
security. It is found in a variety of everyday applications including email, web
browsing, banking, and electronic commerce.
A major problem in the early history of cryptography was the requirement that
each user has the same secret key. This key had to be exchanged using a
2.6 Address calculation in data intensive applications 17
nonencrypted method, e.g., through a personal meeting of the users or by a trusted

courier. This symmetric key was then used to encrypt and decrypt the message. As
one can imagine, key distribution posed a significant hurdle in the early adoption
of cryptographic systems.
Public key cryptography solved this problem through the use of asymmetrical
encryption. Each user in a public key cryptographic system has two keys – one
public and one private. Only the user knows the private key, while everyone knows
the public key. The keys are generated using a one-way function – a function that
is easily computed, yet it is very difficult to perform the “reverse” computation.
For example, it is easy to multiply two prime numbers; however, it is quite difficult
to factor the product into two prime numbers.
RSA [10] is a common algorithm for public key encryption. It involves a
substantial amount of arithmetic computation during encryption and decryption.
The system is based on a public key pair (n,e) and a private key pair (n,d), where
n is the product of two large prime numbers (n ¼ p q). When a message
m has to be transmitted, it is encrypted using the public key of the recipient as
c ¼ me mod n. The original message is decrypted at the receiver using his
private key as m ¼ cd mod n. The computation for both encryption and decryp-
tion is dominated by the exponentiation, me for encryption and cd for decryption.
Exponentiation is very expensive as the size of the exponents are of the
order of 1024 bits in modern implementations [11]. Therefore, fast public key
encryption relies on the ability to compute numbers with large exponents
quickly.
The most popular method for performing exponentiation is using the method of
squaring, though there are some methods which use the concept of addition chains
and claim to reduce the number of multiplications by an average of 22% over the
squaring method [12]. Finding common bit patterns in the exponent can also
reduce the number of multiplications. For example consider the exponentiation
x2925. Note that 2925 ¼ (101101101101)2. Using the method of squaring, x2925 can
be computed as shown in Figure 2.6. By eliminating the common digit pattern
d1 ¼ 101 in the exponent P, it can be rewritten as P ¼ d1 þ d13 þ d16 þ d19
(the represents the left shift operator). By further extracting the common
part d2 ¼ d1 þ d13, P can be rewritten as P ¼ d2 þ d26. Using this property,
the integer exponentiation is rewritten as shown in Figure 2.6. This takes
three fewer multiplications than the method of squaring. Therefore, it can be
seen that further reduction in the number of operations over conventional
methods is possible by extracting common patterns to eliminate redundant
operations.
2.6 Address calculation in data intensive applications
Data transfer intensive applications often consist of a number of memory

accesses, and therefore involve many arithmetic operations during the
(a) (b)
t1 = x ⋅ x = x2 10 P = 101101101101 = 2925
t2 = t1 ⋅ t1 = x4 100 d1 = 101
t3 = t2 ⋅ x = x5 101 d2 = d1 + d1 << 3
t4 = t3 ⋅ t3 = x10 1010 P = d2 + d2 << 6
t5 = t4 ⋅ x = x11 1011
t6 = t5 ⋅ t5 = x22 10110 t1 = x ⋅ x = x2 10
t7 = t6 ⋅ t6 = x44 101100 t2 = t1 ⋅ t1 = x4 100
t8 = t7 ⋅ x = x45 101101 t3 = t2 ⋅ x = x5 101 (d1)
t9 = t8 ⋅ t8 = x90 1011010 t4 = t3 ⋅ t3 = x10 1010
t10 = t9 ⋅ x = x91 1011011 t5 = t4 ⋅ t4 = x20 10100
t11 = t10 ⋅ t10 = x182 10110110 t6 = t5 ⋅ t5 = x40 101000
t12 = t11 ⋅ t11 = x364 101101100 t7 = t6 ⋅ t3 = x45 101101 (d2)
t13 = t12 ⋅ x = x365 101101101 t8 = t7 ⋅ t7 = x90 1011010
t14 = t13 ⋅ t13 = x730 1011011010 t9 = t8 ⋅ t8 = x180 10110100
t15 = t14 ⋅ x = x731 1011011011 t10 = t9 ⋅ t9 = x360 101101000
t16 = t15 ⋅ t15 = x1462 10110110110 t11 = t10 ⋅ t10 = x720 1011010000
t17 = t16 ⋅ t16 = x2924 101101101100 t12 = t11 ⋅ t11 = x1440 10110100000
t18 = t17 ⋅ x = x2925 101101100101 t13 = t12 ⋅ t12 = x2880 101101000000
t14 = t13 ⋅ t7 = x2985 101101101101
Figure 2.6 Exponentiation: (a) using the method of squaring, and (b) eliminating common
computations. The number next to the equations denotes the binary representation of
the current exponent.
computation of the memory access pointers. Consider an indexed linear array

of elements A[I1][I2] [In] with the maximal number of array entries in each dimension
represented by Sj, for each dimension j. Assuming a row-major type organization
for the array A, the address expression (AE) for an element of the array is
AE ¼ ðððI1 S2 þ I2 Þ S3 þ I3 Þ S4 þ Þ Sn þ In . The values of the array sizes
{Sj} are known a priori for statically allocated arrays. In this case, the address expres-
sion becomes an affine form of the type AE ¼ C1 I1 þ C2 I2 þ C3 I3 þ þ In .
These computations are very often on the critical paths in loops and are good
candidates for optimization. The work in [13] addressed this issue and presented
various algebraic transformations for optimizing these address calculations. The
authors targeted the hardware synthesis of custom address calculation units
(ACUs). A number of transformations were investigated including splitting and
clustering the address expressions. The optimizations were designed to break the
individual AE into smaller pieces and then recombine these pieces into an opti-
mized arithmetic expression. Induction variable analysis (a well-known compiler
2.7 Summary 19
optimization that transforms an array index into a pointer that is incremented

by a constant value [2]) and global algebraic transformations, such as CSE and
constant propagation were also investigated.
2.7 Summary
This chapter looked at a number of application domains that use polynomial

expressions and linear systems, and therefore would benefit from the optimiza-
tions presented in this book. We discussed how polynomial functions can approxi-
mate any differentiable function, making them attractive to provide estimates
for complex functions. For example, computer graphics often approximate
arcs, surfaces, curves, and textures using simple polynomial functions. Furthermore,
we described how linear systems are prevalent in DSP, which permeates audio,
speech, image and communications applications. Finally, we described how to
use polynomial functions to model exponentiation required for public key cryptog-
raphy, as well as address calculation for data intensive applications, making
their optimization important for both of these applications.
References
[1] R. H. Bartels, J. C. Beatty, and B. A. Barsky, An Introduction to Splines for Use in

Computer Graphics and Geometric Modeling. San Franciso, CA: Morgan Kaufmann
Publishers, Inc., 1987.
[2] S. S. Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA:
[3] M. K. Johnson, Introduction to the GNU C Library, Linux Journal, 1994, 5, 1994.
[4] K. R. Rao and P. Yip, Discrete Cosine Transform: Algorithms, Advantages,
Applications. New York, NY: Academic Press Professional, Inc., 1990.
[5] I. E. G. Richardson, H.264 and MPEG-4 Video Compression. New York, NY:
John Wiley and Sons, 2003.
[6] R. Tolimieri, M. An, and C. Lu, Algorithms for Discrete Fourier Transforms and
Convolution. Springer, 1997.
[7] M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan, Multiple constant
multiplications: efficient and versatile framework and algorithms for exploring
common subexpression elimination, IEEE Transactions on Computer Aided Design
of Integrated Circuits and Systems, 15 (2), 151–56, 1996.
[8] H. S. Malvar, A. Hallapuro, M. Karczewicz, and L. Kerofsky, Low-complexity
transform and quantization in H.264/AVC, IEEE Transactions on Circuits and
Systems for Video Technology, 13, 598–603, 2003.
[9] M. D. Ercegovac and T. Lang, Digital Arithmetic. San Francisco, CA:
[10] R. L. Rivest, A. Shamir, and L. Adleman, On digital signatures and public-key
cryptosystems, IEEE International Symposium on Information Theory, p. 41.
[11] B. Schneier, Applied Cryptography: Protocols, Algorithms and Source Code in C, second
edition. New York, NY: John Wiley and Sons Inc, 1996.
[12] P. Downey, B. Leong, and R. Sethi, Computing sequences with addition chains,
SIAM Journal of Computing, 10, 638–46, 1981.
[13] M. A. Miranda, F. V. M. Catthoor, M. Janssen, and H. J. De Man, High-level address
optimization and synthesis techniques for data-transfer-intensive applications, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, 6, 677–86, 1998.
3 Software compilation
To further motivate the techniques described in this book, we demonstrate the

situations in which they are useful and the manner in which they fit into the design
flow. Arithmetic optimizations are applicable when writing software as well as during
the design of hardware components. This chapter gives a high-level overview of the
software compilation process. We start by describing the basic structure of a modern
compiler. Then we provide more detail about the compilation process including the
place where arithmetic optimizations can be implemented. Finally, we describe the
algebraic transformations that are used in current compilers. These include dataflow
optimization, CSE, value numbering, loop invariant code motion, partial redun-
dancy elimination (PRE), operator strength reduction, and the Horner form.
3.2 Basic software compiler structure
A compiler is used to reduce the complexity of designing digital systems. Quite

simply, it transforms an input specification written in some high-level language
into another language which is almost always at a lower level of abstraction.
Perhaps the most common example is a software compiler, which takes source
code written in some high-level programming language (e.g., C/Cþþ, Java) into
object code specified by an assembly or machine language for programming a
microprocessor (e.g., Intel x86, SPARC, MIPS).
A compiler provides several benefits including:
(1) Raising the level of abstraction to allow the programmer to reason about the
problem in a high-level language, which is often more efficient.
(2) Performing trivial and/or tedious transformations, e.g., converting assembly
code into binary code.
(3) Finding obvious semantic and syntactic mistakes in the specification.
(4) Performing a set of complex optimizations that may not be obvious to the
programmer.
Figure 3.1 provides a visual description of the basic steps during compilation.
A source program is given as input to the compiler, which performs a number of
Compiler frontend Compiler backend
Characters
Lexical analysis Analysis
Source program written Tokens Intermediate

using a specification code
language
Syntactic analysis Optimization
Syntax Intermediate
tree code
Semantic checking Code generation
Syntax Output
tree code
Intermediate code
generation
Output program
Figure 3.1 The basic structure of a compiler. Compilers are divided into two stages:
a frontend and a backend. The goal is to translate a source program into an output
program; this requires many different optimizations.
steps to transform that program into an output program. We briefly describe each
of these steps in further detail.
Lexical analysis is the first step in compilation; it is often called “lexing” or
“scanning.” This is the act of breaking the input into a set of words or tokens.
A token is an atomic unit in the programming language and commonly includes
variable names, operations, type identifiers, keywords, numbers, and symbols.
One can draw a parallel between lexical analysis and converting letters to words.
Most specification languages specify the token syntax using a regular
language, and, therefore, valid tokens can be represented using a set of regular
expressions. Since every regular expression has an equivalent finite automaton,
we can recognize tokens by scanning the input program one character at a time,
following the appropriate transitions in the finite automaton, and outputting valid
tokens when we reach certain specified states. This stage can find only limited
types of errors, more specifically errors involved in creating tokens. For example,
it can determine that the characters “12abc” are not valid in the C language
since C specifies that variables must start with an alphabetic character.
Syntactic analysis takes the set of tokens from the lexical analysis stage and
groups them into meaningful phrases. This is most often done by creating a tree of
tokens, a parse tree, which specifies the relationship between the tokens. The tree
3.2 Basic software compiler structure 23
is built according to the rules of the formal grammar as denoted in the input
language specification. The parse tree is used in the subsequent stages for analysis
and optimization. In some sense, this stage can be viewed as grouping words
(tokens) into sentences (valid structures in the language). This stage is also often
referred to as “parsing.”
Semantic checking analyzes the parse tree to verify that the input program abides
by the requirements of the specification language. Several properties are con-
firmed. For example, object binding associates the use of every variable/function
to its definition. Definite assignment verifies that every variable is defined before it
is used. Type checking is performed on expressions to insure that operations are
being performed on variables of the appropriate type. A symbol table, which
stores each variable’s type and location, is built during this stage and used for
checking as well as in the later stages of compilation.
The frontend of the compiler ends with the intermediate code generation. This
stage transforms the syntax tree into another representation. This representation
varies from compiler to compiler and depends on the input specification lan-
guage(s) that the compiler accepts as well as the target output language(s) that
the compiler produces. Optimizing compilers often use more than one inter-
mediate representation. In general, the representation is the starting point of the
transformation into the final output program. Therefore, the intermediate code
often looks somewhat similar to the output code. The proceeding optimizations
perform transformations on this intermediate code; hence, the representation
must be easy to change. Furthermore, it should retain the important features
of the input code, while simplifying the code by removing the unimportant
features.
We now discuss two common models of computation used for intermediate
representations – the data flow graph (DFG) and the control flow graph (CFG).
These graphs show the dependencies between operations in the code. Figure 3.2
displays the CFG for an implementation of a factorial function. The function is
broken into a set of basic blocks, which are the nodes of a CFG. A basic block is a
sequence of consecutive intermediate language statements in which flow of control
can only enter at the beginning and leave at the end. In other words, a basic block
is an atomic sequence of statements, i.e., if one of the statements is executed it
means that all other statements will also be executed. The arrows in the CFG
define control dependencies amongst the basic blocks. More formally, a CFG is a
directed multigraph in which: (1) the nodes are basic blocks and (2) the edges
represent flow of control (branches or fall-through execution). Note that the CFG
is formed statically; therefore, we have no information about the values of the
data. Hence, an edge in the CFG simply means there is a possibility to take that
path. Many arithmetic optimizations are performed on a CFG as we discuss in
Section 3.3.
A DFG is a directed acyclic graph where each node is a single instruction or
operation and each edge denotes a direct data dependency between the output of
one node and the input of another. Figure 3.2 shows a simple two-node DFG
Factorial high Factorial control Factorial data flow

level code flow graph graph
int factorial(int n){

Start
int fact = n;
while(n >1) {
n = n – 1; int fact = n;
n;
intfact=
fact = fact * n;
}
n 1
return fact; if(n >1)
if(n> 1)
}
F T – fact
n = n – 1;
fact = fact * n; *
return fact; fact

returnfact;
End
End
Figure 3.2 CFG and DFG representations of the factorial function. The CFG displays
the control dependencies in the function while the DFG exhibits the data dependencies
for the statements within the function.
corresponding to the two statements in one of the basic blocks of the factorial
function. There are two operations in this basic block and equivalently two nodes
in the DFG. The subtract operation produces a data value n that is used by the
subsequent operation. Hence there is an edge from the subtract node to the
multiply node.
Most intermediate representations use some sort of CFG and DFG to model
dependencies. Of course, there are intermediate representations which use other
models of computation. This book focuses primarily on the CFG and the DFG.
We refer the interested reader to more advanced compiler books [1, 2] for further
information.
The development of a compiler frontend is a fairly straightforward process.
There are a number of standard tools (e.g., lex, yacc [3]) to perform each of the
steps and the methodology is quite mature. On the other hand, most compiler
research is focused on the backend, which is still evolving. The backend may vary
significantly across optimizing compilers. As such, we will discuss the backend
stages at a high level, and focus our discussion on the portions that are pertinent to
this book. Referring again to Figure 3.1, we can see there are three stages in the
backend: analysis, optimization, and code generation.
The analysis stage gathers general information about the program structure.
Some typical analyses include deriving information about the data flow, control
flow, function calls, pointers, etc. The previously discussed CFG and DFG are
usually built during this stage. In addition, the call graph, which models function
calls, is often created at this time.
The optimization stage performs transformations on the intermediate code

such that the resulting code is better (according to some optimization function),
yet is functionally equivalent to the initial code. There are probably thousands
(if not more) of optimizations that can be performed. Some of the more popular
include copy propagation, constant propagation, loop unrolling, dead code elim-
ination, code hoisting, induction variable removal, partial redundancy elimin-
ation, inline expansion, tail merging, data prefetching, branch prediction, and
software pipelining, to name just a few. Again, we refer the interested reader
to advanced books on compiler optimization, e.g., [1, 2]. We discuss the compiler
techniques relevant to arithmetic optimization in the following sections of
the book.
The final stage of the backend is code generation. This transforms the intermedi-
ate representation into the final output program. For a software compiler, this is
the native machine language of the targeted microprocessor. This stage requires
decisions about resource usage, e.g., how to schedule the operations onto the
functional units, how to assign data to memory (e.g., which data go into what
register, what is stored in main memory). This stage is, of course, quite dependent
on the language of the target program.
3.3 Algebraic transformations in optimizing software compilers
This section presents some related work on the optimization of arithmetic compu-
tations used in modern software compilers. The presented techniques are applied
to general purpose programs and arithmetic expressions. In particular, the discus-
sion focuses on various techniques for redundancy elimination used in modern
software compilers.
3.3.1 Dataflow optimizations in modern software compilers

This subsection introduces common dataflow optimizations performed in most
modern optimizing software compilers. The optimization process begins with
dataflow analysis, which provides global information on how a procedure or a
larger segment of a program manipulates data. The information provided by the
dataflow analysis enables the application of optimizations such as local and global
CSE, constant propagation, strength reduction, and loop invariant code motion,
each of which is described briefly in this section. For example, constant propaga-
tion analysis determines if all assignments to a particular variable evaluate to the
same constant value at all times.
The dataflow analysis result is stored in data structures such as define use (DU)
chains to hold information about the definitions and uses of all variables in the
procedure. This chapter presents some of the main dataflow optimizations.
A detailed description of these data structures and the procedure for performing
dataflow analysis can be found in compiler texts such as [1]. The transformations
(a) (b)
Entry Entry
t=5*a
b=5*a b=t B1
B1
c=2*b–7 c=2*b–7
Yes Yes
if (b < c) if (b < c)
No
b=0 B2 b=0 B2
d=5*a B3 d=t B3
Exit Exit
Figure 3.3 An example of applying CSE: (a) the original flowgraph, and (b) the flowgraph
after eliminating common subexpression 5 * a.
are typically performed on an intermediate form of the code such as the flowgraph
shown in Figure 3.3.
3.3.2 Common subexpression elimination (CSE)

An expression in a program is a common subexpression if there is another
occurrence of the same expression whose evaluation always precedes this one in
execution order and if the operands of the expression remain unchanged between
the two evaluations. It is possible to calculate the value of a common subexpres-
sion once and use this precalculated value the next time(s) the expression is
computed. This can be beneficial in terms of computation time if the expression
is complex and the two occurrences happen close to each other. The process of
finding common subexpressions and rewriting a program to calculate them once
is called common subexpression elimination (CSE).
CSE can be performed both locally within the basic blocks and globally across
the CFG. The global CSE procedure can detect the local common subexpressions
as well as common subexpressions found between basic blocks. However, it
requires more computation time. The algorithm for local CSE works within
single basic blocks and keeps track of available expressions blocks (AEBs), i.e.,
the expressions which have been computed within the block and whose operands
have not been changed since the computation. The algorithm then iterates
through the basic block, adding entries to and removing them from the list of
AEBs as appropriate, inserting instructions to save the expressions’ values in
temporary variables, and modifying existing instructions to use the values saved
in temporary variables. The iteration stops when no further common subexpres-
sion exists.
The global CSE procedure operates on the entire function, or equivalently the
CFG, and finds the available expressions. An expression exp is said to be available
at the entry to a basic block if there is an evaluation of exp on every control path
from the entry to this block that is not killed before the entry to the basic block
(an expression is killed if one or more of its operands is assigned a new value). The
set of available expressions can be found as follows. Assume that EVAL(i) is the set
of expressions evaluated in block i available at the block’s exit. Further, assume
KILL(i) denotes the set of expressions killed by block i. EVAL(i) is computed by
scanning block i from the beginning to the end, accumulating the expressions
evaluated in it, and deleting those expressions whose operands are later assigned
new values inside the block. AEin(i) and AEout(i) represent the sets of available
expressions on entry to and exit from block i, respectively, as shown in the
data-flow equations in
\
AEinðiÞ ¼ AEoutðjÞ
j2Pr edðiÞ ð3:1Þ
AEoutðiÞ ¼ EVALðiÞ [ ðAEinðiÞ KILLðiÞÞ:
AEin(i) can be computed by finding the intersection of all expressions available at

the exit of blocks preceding block i, i.e., Pred(i); AEout(i) is the union of all
expressions evaluated at block i that are available at the exit of the block, and
the expressions available at the entry of block i that are not killed in the block.
3.3.3 Value numbering

Another useful method for redundancy elimination is called value numbering [4].
Value numbering is based on symbolic evaluation of expressions. It associates a
symbolic value with each computation without interpreting the operation per-
formed by the computation, but in such a way that any two computations with the
same symbolic value always compute the same value.
In the code in Figure 3.4(a), value numbering assigns the same value number
to x and y and the same value number to z and w. As a result the code may
be rewritten as shown in Figure 3.4(b). Later references to y and w can be
replaced with x and z, respectively. Thus, the code can be simplified as shown in
Figure 3.4(c).
While value numbering has the same effect as CSE for basic blocks, there are
differences for global transformations, as shown in Figure 3.5. In the code of
Figure 3.5(a), variables b and d are found to be equal by value numbering. The
same result cannot be obtained by applying the CSE technique because there is no
common subexpression in the expressions. This is because CSE performs only a
(a) (b) (c)

x=2 x=2 x=2
y=2 y=x z=x+1
z=x+1 z=x+1
w=y+1 w=z
Figure 3.4 An example of value numbering: (a) the original code, (b) the code
transformed using value numbering, and (c) the simplified code.
(a) (b)
b=a+3 b=5*a
c=a if (a > 0)
d=c+3 c=5*a
Figure 3.5 Examples showing the difference between value numbering and CSE’s capabilities:
(a) an example which can be simplified using value numbering, but not CSE, and (b) an
example which can be simplified using CSE, but not value numbering.
simple lexicographic search. On the other hand, Figure 3.5(b) shows an example
where global CSE is able to determine that expression (5 * a) appears twice, but
value numbering cannot detect it because variables b and c are not always equal.
For example, if the value of a is not greater than 0, then c ¼ 5 * a will not be
executed, therefore, b and c may have different values.
The original formulation of value numbering operates on individual basic
blocks, but has been extended to a global form [5, 6]. To use value numbering
for basic blocks, hashing is used to partition expressions into classes. Upon
encountering an expression, its hash value is computed. If it is not already among
the expressions with that hash value, it is added to them. The hash function
and the expression matching function are defined to take commutativity of the
operators into account.
3.3.4 Loop invariant code motion

Loop invariant code motion finds computations inside a loop that produce the
same value at every iteration, and subsequently moves these computations outside
of the loop. This happens frequently in address calculation for accessing elements
of arrays. Identifying loop invariant computations is fairly simple. After perform-
ing control flow analysis and identifying loops, the loop invariant instructions can
be found inductively as follows. An instruction is a loop invariant if for each of its
operands one of the followings is true:
(i) the operand is a constant, or
(ii) all definitions of the operand reaching this use of the operand are located
outside the loop, or
(iii) there is exactly one definition of the operand reaching the instruction and that
definition is loop invariant.
(a) (b)
for i = 1, 100 { a1 = 10 * (n + 1)
a = i * (n + 1) a2 = 100 * n
for j = 1, 100 for i = 1, 100 {
b(i, j) = 100 * n + 10 * a + j a3 = a1 * i + a2
}
for j = 1, 100
b(i, j) = a3 + j
}
Figure 3.6 (a) An example of code having loop invariant computations and (b) the code
after transformation.
Figure 3.6(a) shows a piece of code in which there are several loop invariant
expressions. Figure 3.6(b) shows the code after moving the loop invariant expres-
sions outside of the loops. The original code performs two multiplications and two
additions per iteration of the inner loop. The outer loop performs 201 multipli-
cations and 201 additions during each iteration. Overall, the code executes 20 100
multiplications and 20 100 additions.
The modified code performs one addition at each iteration of the inner loop. The
outer loop requires 1 multiplication and 101 additions for each iteration, resulting
in a total of 102 multiplications and 10 101 additions. In this example, loop invari-
ant code motion saves 19 998 multiplications and 9 999 additions. This can have a
significant impact on the execution time of the code and the energy consumption of
the processor executing it. Further improvement can be achieved by modifying the
ranges of i and j in the FOR loops, e.g., instead of using 1 and 100 as the lower and
upper bounds for variable j, a3 þ 1 and a3 þ 100 can be used, respectively.
3.3.5 Partial-redundancy elimination (PRE)

PRE is a combination of global CSE and loop invariant code motion with some
additional code improvements. An expression is partially redundant at point p if it
is redundant along some, but not all, paths that reach p. PRE converts partially
redundant expressions into redundant expressions. To do this, it first uses data-
flow analysis to discover which expressions are partially redundant. Next, it
determines where to insert copies of a computation to convert a partial redun-
dancy into a full redundancy. Finally, it inserts the appropriate code and deletes
the redundant copy of the expression.
In Figure 3.7(a), the expression c þ d is available only along the branch on the
right, and is, therefore, only partially redundant at the join of the two branches.
Inserting another copy of the expression on the other branch makes the computa-
tion redundant and allows its elimination, as shown in Figure 3.7(b).
Loop invariant expressions are also partially redundant, as illustrated in Figure 3.8.
In Figure 3.8(a), c þ d is partially redundant since it is available from one prede-
cessor (along the back edge of the loop), but not the other. Inserting an evaluation
of c þ d before the loop allows it to be eliminated from the loop body. Figure 3.8(b)
shows the code after transformation.
(a) (b)
Entry Entry
No Yes No Yes
if (a < b) if (a < b)
c=x
d=y c=x d=y e=c+d
e=c+d t=c+d t=e
f=c+d f=t
Exit Exit
Figure 3.7 An example of the benefit of PRE: (a) the original partially redundant code,
and (b) the simplified code.
(a) (b)
Entry Entry
c=x
c=x
e=c+d
a=a+1
e=c+d a=a+1
if (a < b) if (a < b)
Yes Yes
No No
Exit Exit
Figure 3.8 PRE can move loop invariant computations to reduce the number of operations:
(a) the original loop, and (b) the loop after moving the loop invariant computation
outside the loop.
3.3.6 Operator strength reduction

Operator strength reduction is the process of replacing costly (strong) oper-
ations with cheaper (weaker) ones. There are two types of strength reduction,
weak and strong. In the weak form of strength reduction, an expression such as
(a) (b)
for j = 1, 100 t=2
a(5 * j + 2) = 1 for j = 1, 100 {
t=t+5
a(t) = 1
}
Figure 3.9 An example showing the benefits of strong strength reduction: (a) the original
loop and (b) the loop after performing strong strength reduction.
2 x is replaced with either x þ x or x << 1. In most processors performing

multiplication is more costly than performing addition or shift. Therefore,
this will reduce the program execution time. Furthermore, in many processors
the number of multiplier units is smaller than the number of addition and shift
units. Thus, using addition or shift instead of multiplication has the additional
advantage that it allows the instruction to be scheduled earlier. This again
helps to reduce the program execution time. In terms of energy consumption,
using addition or shift instead of multiplication is similarly beneficial as
multiplication typically consumes more energy than addition and shift [7].
Further reduction in the energy occurs due to the decrease in the execution
time of the program.
Strong reduction is a more powerful form of strength reduction. In this case, an
iterated series of strong computations is replaced with an equivalent series of
weaker computations. For example, multiplications can be replaced with addi-
tions in array address calculations inside loops. The code in Figure 3.9(a) performs
one multiplication and one addition at each iteration in order to calculate the
address of array a. Figure 3.9(b) shows the code after transformation using strong
strength reduction. In the transformed code, one addition is performed to com-
pute the address of array a at each iteration. The original code performs a total of
100 multiplications and 100 additions in order to calculate the addresses for
accessing array a, while in the modified code only 100 additions are performed.
This significantly reduces the execution time of the code.
3.3.7 Implementing polynomials in the Horner form

Using the Horner form is one of the most popular methods of evaluating polyno-
mial approximations of trigonometric functions, and is the default method in
many libraries including the GNU C library [8]. In the Horner form, a polynomial
is written as a sequence of nested multiplications and additions. For example, a
polynomial in variable x, PðxÞ ¼ a0 xn þ a1 xn1 þ a2 xn2 þ a3 xn3 þ þ an is
written as, PðxÞ ¼ ð ðða0 x þ a1 Þx þ a2 Þx þ þ an1 Þx þ an
The Horner form has the following advantages:
(1) It reduces the number of multiplications required to calculate a polynomial.
In the above example, calculating the polynomial in the original form requires
n(n þ 1)/2 multiplications and n additions,1 while the Horner form uses n
multiplications and n additions. This substantially reduces the number of
multiplications if n is large. Since multiplication is an expensive operation in
terms of cycle time and energy consumption, transforming to the Horner form
is an effective way of reducing execution time and energy consumption of
software programs. In [9], the authors report an average 55% reduction in the
number of multiplications when the Horner form is used instead of unopti-
mized expressions for a set of applications.
(2) The special form of the resulting polynomial eases the use of MAC operations,
which exist in many processors especially digital signal processors. The poly-
nomial can be calculated by first computing P0 ¼ a0x þ a1 using a MAC
operation, then calculating P1 ¼ P0x þ a2 and so on. This means the calcula-
tion can be done using n MAC operations. Again, this significantly reduces the
execution time and the energy consumption.
(3) The Horner form increases the numerical stability. In the original form, even
when the value of P(x) is small, the intermediate values (e.g., xn, a0xn, a0xn þ
a1xn1) can be prohibitively large. Thus, it may not be possible to represent
them directly in a 32-bit or a 64-bit processor.2 On the other hand, in the
Horner form, intermediate values P0, P1, . . ., Pn1 can be small if for example
a0 and a1 have different signs.
(4) It is easy to write polynomials in the Horner form. Therefore, it can be
integrated into a compiler with little effort. Furthermore, the transformation
of polynomials into the Horner form can be done quickly, which means it will
not have a major impact on the compilation time which is very important in
general purpose compilers.
The disadvantages of the Horner form are the following:
(1) It optimizes only a single polynomial at a time; it does not look for common
subexpressions among a set of polynomials. Furthermore, it is not good at
optimizing multivariate polynomials, used for example in computer graphics
applications [10]. Equation (2.2) shows the quartic polynomial used in three-
dimensional computer graphics for modeling textures. The original polyno-
mial consists of 23 multiplications or 21 multiplications and 2 shift operations.
The polynomial in the Horner form (shown in Equation (2.4)) has 17 multipli-
cations or 15 multiplications and 2 shifts. Using algebraic methods [9], the
polynomial can be optimized to a form which requires 13 multiplications or
12 multiplications and 1 shift.
(2) The Horner form may not give the best result if some coefficients of the
polynomial are zero. For P(x) ¼ a0x6 þ a2x4 þ a4x2 þ a6 the Horner method
1
Calculating term aixn irequires n i multiplications. Therefore, the total number of necessary
multiplications is n þ ðn 1Þ þ þ 1 ¼ nðn þ 1Þ=2.
2
It is possible to use several words to represent a large number on a processor, but this significantly
reduces the performance.
3.4 Summary 33
(a) (b) (c)

P1 = x2 + xy z=x+y z = xy
P2 = xy + y2 P1 = xz P1 = x2 + z
P2 = yz P2 = z + y2
Figure 3.10 An example illustrating the power of algebraic techniques: (a) the original
equations, (b) the equations optimized using an algebraic technique, and (c) the equations
optimized using CSE.
results in P(x) ¼ (((((a0x þ 0)x þ a2)x þ 0)x þ a4)x þ 0)x þ a6 which requires
six MAC operations or six multiplications and three additions, while it
is easy to write the polynomial using y ¼ x x as P(x) ¼ a0y3 þ a2y2 þ a4y
þ a6 ¼ ((a0y þ a2)y þ a4)y þ a6. In this case, a total of four MAC operations or
four multiplications and three additions are necessary.
3.3.8 Drawbacks of conventional techniques

While CSE, value numbering, loop-invariant code motion, PRE, operator
strength reduction and the Horner transform are useful, they fail to find many
optimization opportunities existing in polynomial expressions. In fact, other than
operator strength reduction and the Horner transform, these techniques do not
use properties specific to polynomial expressions during transformation.3
Consider the polynomials in Figure 3.10(a) and their optimized versions in
Figure 3.10(b). The original form requires four multiplications and two additions
to compute the polynomials; the optimized form requires only two multiplications
and one addition. This is a 50% reduction in the number of multiplications and
additions. The aforementioned transformations cannot find the solution presented
in Figure 3.10(b); at best CSE can be used to eliminate x y between the two
polynomials. Figure 3.10(c) shows the result. This case requires three multipli-
cations and two additions, which is obviously not optimum. The above example
shows the power of algebraic methods, which use the properties of addition,
subtraction and multiplication to better optimize arithmetic expressions.
3.4 Summary
This chapter presented some of the basic concepts in software design flows. We
started by describing the fundamental steps for a software compiler including
those in the frontend and the backend. Then we provided more detail on the
compilation process including the place where arithmetic optimization can be
3
Value numbering uses properties such as commutativity, but this is not specific to arithmetic
operations.
implemented and described the algebraic transformations which are used in

current compilers. As we will see in the next chapter, some of these optimizations
can be used during hardware synthesis as well.
References
[1] S. S. Muchnick, Advanced Compiler Design and Implementation. San Francisco, CA:
[2] K. Kennedy and J. R. Allen, Optimizing Compilers for Modern Architectures:
A Dependence-based Approach. San Francisco, CA: Morgan Kaufmann Publishers,
2001.
[3] J. R. Levine, T. Mason, and D. Brown, Lex & yacc, second edition. Sebastopol, CA:
O’Reilly & Associates, 1995.
[4] J. Cocke and J. T. Schwartz, Programming Languages and Their Compilers:
Preliminary Notes, Technical Report, Courant Institute of Mathematical Sciences,
New York University, 1970.
[5] J. R. Reif and H. R. Lewis, Symbolic evaluation and the global value graph,
Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming
Languages, Los Angeles, 1977, pp. 104–18. New York, NY: ACM, 1977.
[6] B. Alpern, M. N. Wegman and F. K. Zadeck, Detecting equality of variables in
programs, Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages, San Diego, 1988, pp. 1–11. New York, NY: ACM, 1988.
[7] A. Sinha and A. P. Chandrakasan, JouleTrack – a web based tool for software energy
profiling, Proceedings of the 38th Conference on Design Automation, Las Vegas, 2001.
pp. 220–225. New York, NY: ACM, 2001.
[8] http://www.gnu.org/software/libc/
[9] A. Hosangadi, F. Fallah and R. Kastner, Factoring and eliminating common
subexpressions in polynomial expressions, Proceedings of the 2004 IEEE/ACM
International Conference on Computer-aided Design, San Jose, 2004, pp. 169–174.
[10] G. Nurnberger, J. W. Schmidt and G. Walz, Multivariate Approximation and Splines.
Basel: Birkhäuser, 1997.
4 Hardware synthesis
This chapter provides a brief summary of the stages in the hardware synthesis
design flow. It is designed to give unfamiliar readers a high-level understanding
of the hardware design process. The material in subsequent chapters describes
different hardware implementations of polynomial expressions and linear systems.
Therefore, we feel that it is important, though not necessarily essential, to have an
understanding of the hardware synthesis process.
The chapter starts with a high-level description of the hardware synthesis
design flow. It then proceeds to discuss the various components of this design
flow. These include the input system specification, the program representation,1
algorithmic optimizations, resource allocation, operation scheduling, and resource
binding. The chapter concludes with a case study using an FIR filter. This
provides a step-by-step example of the hardware synthesis process. Additionally,
it gives insight into the hardware optimization techniques presented in the
following chapters.
4.2 Hardware synthesis design flow
The initial stages of a hardware design flow are quite similar to the frontend of
a software compiler. One of the biggest differences is that the input system
specification languages are different. Hardware description languages must deal
with many features that are unnecessary in software, which for the most part
model execution in a serial fashion. Such features include the need to model
concurrent execution of the underlying resources, define a variety of different data
types specifically for different bit widths, and introduce some notion of time into
the language. Figure 4.1 gives a high-level view of the different stages of hardware
compilation.
Architectural synthesis is an automated design process that interprets an algo-
rithmic representation of a behavior and creates hardware specification that
1
We use the term “program representation,” a common term in software compilation, due to the
absence of a widely used term in hardware synthesis.
System
Architectural synthesis
specification
Logic and physical
Lexical and synthesis
syntactic analysis
Program
representation
Logic synthesis
Algorithmic
optimization
Common subexpression
elimination, constant
folding, and other
common compiler Resource allocation
techniques (focus of and scheduling Physical synthesis:
this book) floorplanning,
placement, and
routing
Resource binding
Register
transfer level
description GDSII
Figure 4.1 A high-level view of the stages of hardware compilation. These can be broadly
broken down to architectural, logic, and physical synthesis. The optimizations described in
this book are primarily focused on architectural synthesis, specifically on the algorithmic
optimization.
executes the behavior. More formally, it is the process of creating a structural

microarchitectural representation, or register transfer level (RTL) description,
from a system specification of an application. A structural representation defines
the exact interconnection between a set of architectural resources. An architec-
tural resource is a storage element, functional unit, or interconnect logic.
A storage element provides a method of saving the state of the circuit. A register
is an example of a storage element. A functional unit performs an arithmetic or logic
operation (e.g., addition, multiplication, shift). Interconnect logic is used to route
data between memory and functional units. For example, a multiplexor propa-
gates a particular piece of data (correspondingly a set of signals) depending on its
input condition. A control unit (controller) issues control signals to direct the
resources.
Architectural synthesis can be performed using any number of different design
flows. Additionally, a designer typically adds constraints or optimization objectives.
For example, the designer may want to produce a circuit that requires the least
amount of area. In this case, the objective function would be to minimize the area.
Many other constraints have been considered during architectural synthesis.
4.2 Hardware synthesis design flow 37
Throughput, power, clock frequency, and latency are some of the common
optimization objectives.
The first step of architectural synthesis is lexical and syntactic analysis, which
parses the input specification into a program representation. This step is very
similar to that of software compilation and more details of this can be found in
Section 3.2. The program representation is a description of the system specification
that is easily amenable to analysis, optimization, and translation to a more refined
specification, which in this case is the register transfer level description. There are
many examples of program representations; we discuss some of them later in this
chapter. The DFG is perhaps the most popular program representation for
architectural synthesis. We formally describe this in Section 4.4. However, in order
to progress our discussion to the next steps in the architectural synthesis process, we
will now informally define it as a directed graph consisting of vertices that represent
operations and directed edges that denote dependencies between operations.
The architectural synthesis problem can be defined in the following manner:
given a system specification, a set of fully characterized architectural resources,
a set of constraints, and an optimization function, determine a connected set of
resources (a structural representation) that conforms to the given constraints and
minimizes the objective function. The architectural synthesis problem can be split
into the following subproblems: algorithmic optimization, resource allocation,
operation scheduling, and operation binding.
Algorithmic optimization uses a set of techniques that transform the program
representation to make it run faster, use fewer operations, expose parallelism,
enable more accurate dependency analysis, improve memory usage, and so on.
These techniques are very often similar to those found in software compilers and
include optimizations such as CSE, loop unrolling, dead code elimination. The
techniques that we present later in this book for polynomial and linear system
optimization can be used in this stage.
Resource allocation is the act of choosing the appropriate number and type of
components from a library. For example, you can choose to have two adders – one
ripple carry and one carry look-ahead – one multiplier, one divider, etc. Scheduling
determines the temporal ordering of the operations. Given a set of operations with
execution delays and a partial ordering, the scheduling problem assigns a start
time for each operation. The start times must follow the precedence constraints as
specified in the system specification. Additional restrictions such as timing and
area constraints may be added to the problem, depending on the target architec-
ture. The scheduling affects the resource allocation and vice-versa. Therefore, the
ordering of these two tasks is sometimes interchanged; some synthesis tools
perform scheduling, then resource allocation, while others allocate the resources
first, and then schedule the operations.
Resource binding is the assignment of each operation to a specific hardware
component; it is an explicit mapping between operations and resources. The goal
of resource binding is to minimize the area by allowing multiple operations to
share a common resource. The scheduling limits the possible resource bindings.
For example, operations that are scheduled at the same time cannot share the
same resource. To be more precise, any two operations can be bound to the same
resource if they are not executed concurrently, i.e., are not scheduled in overlap-
ping time steps. Some resources are capable of executing different operations, e.g.,
both an addition and subtraction can be bound to an arithmetic logic unit (ALU).
The resource binding can greatly affect the area and latency of the circuit as it
dictates the number of interconnect logic and storage elements of the circuit.
Logic synthesis is the act of taking the register transfer level description that is
output from architectural synthesis and transforming it into a network of logic
gates. There are a number of optimizations that are performed during this stage.
The optimizations are generally grouped into two types – multi-level and two-
level. The two-level optimizations have roots in Boolean minimization, which
attempts to minimize the number of gates in a two-stage Boolean network.
Multi-level optimizations often view the problem as a network of logic gates and
attempt to minimize the number and the area of the gates as well as the critical
path or the delay of the network. An interested reader can find a vast amount of
literature on this topic. Reference [1] is a good introduction to the basic algorithms.
Physical synthesis or physical design looks at how the logical network can be
transformed into an integrated circuit that can be fabricated. The output is
essentially a set of planar geometric shapes that detail the size and the type of
materials needed to make the transistors and wires in a circuit. GDSII is one
common database format used to specify the layout of the integrated circuit. The
primary tasks of physical synthesis are floorplanning, placement, and routing.
Floorplanning creates a basic plan for the layout of the chip, indicating the general
area where hard macros, power and ground planes, input/output (I/O) and other
logic elements reside. Placement assigns an exact physical location for each of the
logic gates, while routing determines the precise wiring of the required intercon-
nections between the gates. Further information on the stages of physical synthe-
sis, as well as the algorithms used to implement these stages, can be found in [2].
Now that we have given an overview of entire hardware synthesis process, we
will go into detail on a few of the topics that are needed to fully understand the
later chapters in this book. Specifically, we discuss the architectural synthesis
process. The majority of the optimizations in this book occur in the algorithm
optimization stage; however, it is important to understand the other stages in
architecture synthesis, which we focus on in the remainder of the chapter.
4.3 System specification
A system specification is a representation that captures the aspects of the applica-

tion that one wishes to synthesize and includes the architectural specification,
functional specification, and desired performance metrics. The architectural
specification describes the target technology that is finally used to implement the
application. This can be a mix of intellectual property (IP) cores including
Another random document with
no related content on Scribd:
spoken—a work in which he shows quite another spirit from that
which appears in his former compilation from your four-and-twenty
elders. At that time he thought that there might be opinions
probable in speculation, which might not be safe in practice; but he
has now come to form an opposite judgment, and has, in this, his
latest work, confirmed it. Such is the wonderful growth attained by
the doctrine of probability in general, as well as by every probable
opinion in particular, in the course of time. Attend, then, to what he
says: “I cannot see how it can be that an action which seems
allowable in speculation should not be so likewise in practice;
because what may be done in practice depends on what is found to
be lawful in speculation, and the things differ from each other only
as cause and effect. Speculation is that which determines to action.
Whence it follows that opinions probable in speculation may be followed
with a safe conscience in practice, and that even with more safety than
those which have not been so well examined as matters of
speculation.”[250]
Verily, fathers, your friend Escobar reasons uncommonly well
sometimes; and, in point of fact, there is such a close connection
between speculation and practice, that when the former has once
taken root, you have no difficulty in permitting the latter, without any
disguise. A good illustration of this we have in the permission “to kill
for a buffet,” which, from being a point of simple speculation, was
boldly raised by Lessius into a practice “which ought not easily to be
allowed;” from that promoted by Escobar to the character of “an
easy practice;” and from thence elevated by your fathers of Caen, as
we have seen, without any distinction between theory and practice,
into a full permission. Thus you bring your opinions to their full
growth very gradually. Were they presented all at once in their
finished extravagance, they would beget horror; but this slow
imperceptible progress gradually habituates men to the sight of
them, and hides their offensiveness. And in this way the permission
to murder, in itself so odious both to Church and State, creeps first
into the Church, and then from the Church into the State.
A similar success has attended the opinion of “killing for slander,”
which has now reached the climax of a permission without any
distinction. I should not have stopped to quote my authorities on
this point from your writings, had it not been necessary in order to
put down the effrontery with which you have asserted, twice over, in
your fifteenth Imposture, “that there never was a Jesuit who
permitted killing for slander.” Before making this statement, fathers,
you should have taken care to prevent it from coming under my
notice, seeing that it is so easy for me to answer it. For, not to
mention that your fathers Reginald, Filiutius, and others, have
permitted it in speculation, as I have already shown, and that the
principle laid down by Escobar leads us safely on to the practice, I
have to tell you that you have authors who have permitted it in so
many words, and among others Father Hereau in his public lectures,
on the conclusion of which the king put him under arrest in your
house, for having taught, among other errors, that when a person
who has slandered us in the presence of men of honor, continues to
do so after being warned to desist, it is allowable to kill him, not
publicly, indeed, for fear of scandal, but IN A PRIVATE WAY—sed clam.
I have had occasion already to mention Father Lamy, and you do
not need to be informed that his doctrine on this subject was
censured in 1649 by the University of Louvain.[251] And yet two
months have not elapsed since your Father Des Bois maintained this
very censured doctrine of Father Lamy, and taught that “it was
allowable for a monk to defend the honor which he acquired by his
virtue, EVEN BY KILLING the person who assails his reputation—etiam
cum morte invasoris;” which has raised such a scandal in that town,
that the whole of the curés united to impose silence on him, and to
oblige him, by a canonical process, to retract his doctrine. The case
is now pending in the Episcopal court.
What say you now, fathers? Why attempt, after that, to maintain
that “no Jesuit ever held that it was lawful to kill for slander?” Is
anything more necessary to convince you of this than the very
opinions of your fathers which you quote, since they do not
condemn murder in speculation, but only in practice, and that, too,
“on account of the injury that might thereby accrue to the State?”
And here I would just beg to ask, whether the whole matter in
dispute between us is not simply and solely to ascertain if you have
or have not subverted the law of God which condemns murder? The
point in question is, not whether you have injured the
commonwealth, but whether you have injured religion. What
purpose, then, can it serve, in a dispute of this kind, to show that
you have spared the State, when you make it apparent, at the same
time, that you have destroyed the faith? Is this not evident from
your saying that the meaning of Reginald, on the question of killing
for slanders, is, “that a private individual has a right to employ that
mode of defence, viewing it simply in itself?” I desire nothing beyond
this concession to confute you. “A private individual,” you say, “has a
right to employ that mode of defence” (that is, killing for slanders),
“viewing the thing in itself;” and, consequently, fathers, the law of
God, which forbids us to kill, is nullified by that decision.
It serves no purpose to add, as you have done, “that such a mode
is unlawful and criminal, even according to the law of God, on
account of the murders and disorders which would follow in society,
because the law of God obliges us to have regard to the good of
society.” This is to evade the question: for there are two laws to be
observed—one forbidding us to kill, and another forbidding us to
harm society. Reginald has not, perhaps, broken the law which
forbids us to do harm to society; but he has most certainly violated
that which forbids us to kill. Now this is the only point with which we
have to do. I might have shown, besides, that your other writers,
who have permitted these murders in practice, have subverted the
one law as well as the other. But, to proceed, we have seen that you
sometimes forbid doing harm to the State; and you allege that your
design in that is to fulfil the law of God, which obliges us to consult
the interests of society. That may be true, though it is far from being
certain, as you might do the same thing purely from fear of the civil
magistrate. With your permission, then, we shall scrutinize the real
secret of this movement.
Is it not certain, fathers, that if you had really any regard to God,
and if the observance of his law had been the prime and principal
object in your thoughts, this respect would have invariably
predominated in all your leading decisions, and would have engaged
you at all times on the side of religion? But if it turns out, on the
contrary, that you violate, in innumerable instances, the most sacred
commands that God has laid upon men, and that, as in the instances
before us, you annihilate the law of God, which forbids these actions
as criminal in themselves, and that you only scruple to approve of
them in practice, from bodily fear of the civil magistrate, do you not
afford us ground to conclude that you have no respect to God in
your apprehensions, and that if you yield an apparent obedience to
his law, in so far as regards the obligation to do no harm to the
State, this is not done out of any regard to the law itself, but to
compass your own ends, as has ever been the way with politicians
of no religion?
What, fathers! will you tell us that, looking simply to the law of
God, which says, “Thou shalt not kill,” we have a right to kill for
slanders? And after having thus trampled on the eternal law of God,
do you imagine that you atone for the scandal you have caused, and
can persuade us of your reverence for him, by adding that you
prohibit the practice for State reasons, and from dread of the civil
arm? Is not this, on the contrary, to raise a fresh scandal?—I mean
not by the respect which you testify for the magistrate; that is not
my charge against you, and it is ridiculous in you to banter, as you
have done, on this matter. I blame you, not for fearing the
magistrate, but for fearing none but the magistrate. And I blame you
for this, because it is making God less the enemy of vice than man.
Had you said that to kill for slander was allowable according to men,
but not according to God, that might have been something more
endurable; but when you maintain, that what is too criminal to be
tolerated among men, may yet be innocent and right in the eyes of
that Being who is righteousness itself, what is this but to declare
before the whole world, by a subversion of principle as shocking in
itself as it is alien to the spirit of the saints, that while you can be
braggarts before God, you are cowards before men?
Had you really been anxious to condemn these homicides, you
would have allowed the commandment of God which forbids them to
remain intact; and had you dared at once to permit them, you would
have permitted them openly, in spite of the laws of God and men.
But your object being to permit them imperceptibly, and to cheat the
magistrate, who watches over the public safety, you have gone
craftily to work. You separate your maxims into two portions. On the
one side, you hold out “that it is lawful in speculation to kill a man
for slander;”—and nobody thinks of hindering you from taking a
speculative view of matters. On the other side, you come out with
this detached axiom, “that what is permitted in speculation is also
permissible in practice;”—and what concern does society seem to
have in this general and metaphysical-looking proposition? And thus
these two principles, so little suspected, being embraced in their
separate form, the vigilance of the magistrate is eluded; while it is
only necessary to combine the two together, to draw from them the
conclusion which you aim at—namely, that it is lawful in practice to
put a man to death for a simple slander.
It is, indeed, fathers, one of the most subtle tricks of your policy,
to scatter through your publications the maxims which you club
together in your decisions. It is partly in this way that you establish
your doctrine of probabilities, which I have frequently had occasion
to explain. That general principle once established, you advance
propositions harmless enough when viewed apart, but which, when
taken in connection with that pernicious dogma, become positively
horrible. An example of this, which demands an answer, may be
found in the 11th page of your “Impostures,” where you allege that
“several famous theologians have decided that it is lawful to kill a
man for a box on the ear.” Now, it is certain, that if that had been
said by a person who did not hold probabilism, there would be
nothing to find fault with in it; it would in this case amount to no
more than a harmless statement, and nothing could be elicited from
it. But you, fathers, and all who hold that dangerous tenet, “that
whatever has been approved by celebrated authors is probable and
safe in conscience,” when you add to this “that several celebrated
authors are of opinion that it is lawful to kill a man for a box on the
ear,” what is this but to put a dagger into the hand of all Christians,
for the purpose of plunging it into the heart of the first person that
insults them, and to assure them that, having the judgment of so
many grave authors on their side, they may do so with a perfectly
safe conscience?
What monstrous species of language is this, which, in announcing
that certain authors hold a detestable opinion, is at the same time
giving a decision in favor of that opinion—which solemnly teaches
whatever it simply tells! We have learnt, fathers, to understand this
peculiar dialect of the Jesuitical school; and it is astonishing that you
have the hardihood to speak it out so freely, for it betrays your
sentiments somewhat too broadly. It convicts you of permitting
murder for a buffet, as often as you repeat that many celebrated
authors have maintained that opinion.
This charge, fathers, you will never be able to repel; nor will you
be much helped out by those passages from Vasquez and Suarez
that you adduce against me, in which they condemn the murders
which their associates have approved. These testimonies, disjoined
from the rest of your doctrine, may hoodwink those who know little
about it; but we, who know better, put your principles and maxims
together. You say, then, that Vasquez condemns murders; but what
say you on the other side of the question, my reverend fathers?
Why, “that the probability of one sentiment does not hinder the
probability of the opposite sentiment; and that it is warrantable to
follow the less probable and less safe opinion, giving up the more
probable and more safe one.” What follows from all this taken in
connection, but that we have perfect freedom of conscience to adopt
any one of these conflicting judgments which pleases us best? And
what becomes of all the effect which you fondly anticipate from your
quotations? It evaporates in smoke, for we have no more to do than
to conjoin for your condemnation the maxims which you have
disjoined for your exculpation. Why, then, produce those passages of
your authors which I have not quoted, to qualify those which I have
quoted, as if the one could excuse the other? What right does that
give you to call me an “impostor?” Have I said that all your fathers
are implicated in the same corruptions? Have I not, on the contrary,
been at pains to show that your interest lay in having them of all
different minds, in order to suit all your purposes? Do you wish to kill
your man?—here is Lessius for you. Are you inclined to spare him?—
here is Vasquez. Nobody need go away in ill humor—nobody without
the authority of a grave doctor. Lessius will talk to you like a Heathen
on homicide, and like a Christian, it may be, on charity. Vasquez,
again, will descant like a Heathen on charity, and like a Christian on
homicide. But by means of probabilism, which is held both by
Vasquez and Lessius, and which renders all your opinions common
property, they will lend their opinions to one another, and each will
be held bound to absolve those who have acted according to
opinions which each of them has condemned. It is this very variety,
then, that confounds you. Uniformity, even in evil, would be better
than this. Nothing is more contrary to the orders of St. Ignatius[252]
and the first generals of your Society, than this confused medley of
all sorts of opinions, good and bad. I may, perhaps, enter on this
topic at some future period; and it will astonish many to see how far
you have degenerated from the original spirit of your institution, and
that your own generals have foreseen that the corruption of your
doctrine on morals might prove fatal, not only to your Society, but to
the Church universal.[253]
Meanwhile, I repeat that you can derive no advantage from the
doctrine of Vasquez. It would be strange, indeed, if, out of all the
Jesuits that have written on morals, one or two could not be found
who may have hit upon a truth which has been confessed by all
Christians. There is no glory in maintaining the truth, according to
the Gospel, that it is unlawful to kill a man for smiting us on the
face; but it is foul shame to deny it. So far, indeed, from justifying
you, nothing tells more fatally against you than the fact that, having
doctors among you who have told you the truth, you abide not in
the truth, but love the darkness rather than the light. You have been
taught by Vasquez that it is a heathen, and not a Christian, opinion
to hold that we may knock down a man for a blow on the cheek;
and that it is subversive both of the Gospel and of the decalogue to
say that we may kill for such a matter. The most profligate of men
will acknowledge as much. And yet you have allowed Lessius,
Escobar, and others, to decide, in the face of these well-known
truths, and in spite of all the laws of God against manslaughter, that
it is quite allowable to kill a man for a buffet!
What purpose, then, can it serve to set this passage of Vasquez
over against the sentiment of Lessius, unless you mean to show
that, in the opinion of Vasquez, Lessius is a “heathen” and a
“profligate?” and that, fathers, is more than I durst have said myself.
What else can be deduced from it than that Lessius “subverts both
the Gospel and the decalogue;” that, at the last day, Vasquez will
condemn Lessius on this point, as Lessius will condemn Vasquez on
another; and that all your fathers will rise up in judgment one
against another, mutually condemning each other for their sad
outrages on the law of Jesus Christ?
To this conclusion, then, reverend fathers, must we come at
length, that as your probabilism renders the good opinions of some
of your authors useless to the Church, and useful only to your policy,
they merely serve to betray, by their contrariety, the duplicity of your
hearts. This you have completely unfolded, by telling us, on the one
hand, that Vasquez and Suarez are against homicide, and on the
other hand, that many celebrated authors are for homicide; thus
presenting two roads to our choice, and destroying the simplicity of
the Spirit of God, who denounces his anathema on the deceitful and
the double-hearted: “Væ duplici corde, et ingredienti duabus viis!—
Woe be to the double hearts, and the sinner that goeth two
ways!”[254]
LETTER XIV.
TO THE REVEREND FATHERS, THE JESUITS.

IN WHICH THE MAXIMS OF THE JESUITS ON MURDER ARE REFUTED FROM THE
FATHERS—SOME OF THEIR CALUMNIES ANSWERED BY THE WAY—AND THEIR
DOCTRINE COMPARED WITH THE FORMS OBSERVED IN CRIMINAL TRIALS.
October 23, 1656.

Reverend Fathers,—If I had merely to reply to the three remaining
charges on the subject of homicide, there would be no need for a
long discourse, and you will see them refuted presently in a few
words; but as I think it of much more importance to inspire the
public with a horror at your opinions on this subject, than to justify
the fidelity of my quotations, I shall be obliged to devote the greater
part of this letter to the refutation of your maxims, to show you how
far you have departed from the sentiments of the Church, and even
of nature itself. The permissions of murder, which you have granted
in such a variety of cases, render it very apparent, that you have so
far forgotten the law of God, and quenched the light of nature, as to
require to be remanded to the simplest principles of religion and of
common sense.
What can be a plainer dictate of nature than that “no private
individual has a right to take away the life of another?” “So well are
we taught this of ourselves,” says St. Chrysostom, “that God, in
giving the commandment not to kill, did not add as a reason that
homicide was an evil; because,” says that father, “the law supposes
that nature has taught us that truth already.” Accordingly, this
commandment has been binding on men in all ages. The Gospel has
confirmed the requirement of the law; and the decalogue only
renewed the command which man had received from God before the
law, in the person of Noah, from whom all men are descended. On
that renovation of the world, God said to the patriarch: “At the hand
of man, and at the hand of every man’s brother, will I require the life
of man. Whoso sheddeth man’s blood, by man shall his blood be
shed; for man is made in the image of God.” (Gen. ix. 5, 6.) This
general prohibition deprives man of all power over the life of man.
And so exclusively has the Almighty reserved this prerogative in his
own hand, that, in accordance with Christianity, which is at utter
variance with the false maxims of Paganism, man has no power even
over his own life. But, as it has seemed good to his providence to
take human society under his protection, and to punish the evil-
doers that give it disturbance, he has himself established laws for
depriving criminals of life; and thus those executions which, without
his sanction, would be punishable outrages, become, by virtue of his
authority, which is the rule of justice, praiseworthy penalties. St.
Augustine takes an admirable view of this subject. “God,” he says,
“has himself qualified this general prohibition against manslaughter,
both by the laws which he has instituted for the capital punishment
of malefactors, and by the special orders which he has sometimes
issued to put to death certain individuals. And when death is inflicted
in such cases, it is not man that kills, but God, of whom man may be
considered as only the instrument, in the same way as a sword in
the hand of him that wields it. But, these instances excepted,
whosoever kills incurs the guilt of murder.”[255]
It appears, then, fathers, that the right of taking away the life of
man is the sole prerogative of God, and that having ordained laws
for executing death on criminals, he has deputed kings or
commonwealths as the depositaries of that power—a truth which St.
Paul teaches us, when, speaking of the right which sovereigns
possess over the lives of their subjects, he deduces it from Heaven
in these words: “He beareth not the sword in vain; for he is the
minister of God to execute wrath upon him that doeth evil.” (Rom.
xiii. 4.) But as it is God who has put this power into their hands, so
he requires them to exercise it in the same manner as he does
himself; in other words, with perfect justice; according to what St.
Paul observes in the same passage: “Rulers are not a terror to good
works, but to the evil. Wilt thou, then, not be afraid of the power?
Do that which is good: for he is the minister of God to thee for
good.” And this restriction, so far from lowering their prerogative,
exalts it, on the contrary, more than ever; for it is thus assimilated to
that of God, who has no power to do evil, but is all-powerful to do
good; and it is thus distinguished from that of devils, who are
impotent in that which is good, and powerful only for evil. There is
this difference only to be observed betwixt the King of Heaven and
earthly sovereigns, that God, being justice and wisdom itself, may
inflict death instantaneously on whomsoever and in whatsoever
manner he pleases; for, besides his being the sovereign Lord of
human life, it is certain that he never takes it away either without
cause or without judgment, because he is as incapable of injustice
as he is of error. Earthly potentates, however, are not at liberty to
act in this manner; for, though the ministers of God, still they are but
men, and not gods. They may be misguided by evil counsels,
irritated by false suspicions, transported by passion, and hence they
find themselves obliged to have recourse, in their turn also, to
human agency, and appoint magistrates in their dominions, to whom
they delegate their power, that the authority which God has
bestowed on them may be employed solely for the purpose for
which they received it.
I hope you understand, then, fathers, that to avoid the crime of
murder, we must act at once by the authority of God, and according
to the justice of God; and that when these two conditions are not
united, sin is contracted; whether it be by taking away life with his
authority, but without his justice; or by taking it away with justice,
but without his authority. From this indispensable connection it
follows, according to St. Augustine, “that he who, without proper
authority, kills a criminal, becomes a criminal himself, chiefly for this
reason, that he usurps an authority which God has not given him;”
and on the other hand, magistrates, though they possess this
authority, are nevertheless chargeable with murder, if, contrary to
the laws which they are bound to follow, they inflict death on an
innocent man.
Such are the principles of public safety and tranquillity which have
been admitted at all times and in all places, and on the basis of
which all legislators, sacred and profane, from the beginning of the
world, have founded their laws. Even Heathens have never ventured
to make an exception to this rule, unless in cases where there was
no other way of escaping the loss of chastity or life, when they
conceived, as Cicero tells us, “that the law itself seemed to put its
weapons into the hands of those who were placed in such an
emergency.”
But with this single exception, which has nothing to do with my
present purpose, that such a law was ever enacted, authorizing or
tolerating, as you have done, the practice of putting a man to death,
to atone for an insult, or to avoid the loss of honor or property,
where life is not in danger at the same time; that, fathers, is what I
deny was ever done, even by infidels. They have, on the contrary,
most expressly forbidden the practice. The law of the Twelve Tables
of Rome bore, “that it is unlawful to kill a robber in the day-time,
when he does not defend himself with arms;” which, indeed, had
been prohibited long before in the 22d chapter of Exodus. And the
law Furem, in the Lex Cornelia, which is borrowed from Ulpian,
forbids the killing of robbers even by night, if they do not put us in
danger of our lives.[256]
Tell us now, fathers, what authority you have to permit what all
laws, human as well as divine, have forbidden; and who gave
Lessius a right to use the following language? “The book of Exodus
forbids the killing of thieves by day, when they do not employ arms
in their defence; and in a court of justice, punishment is inflicted on
those who kill under these circumstances. In conscience, however,
no blame can be attached to this practice, when a person is not sure
of being able otherwise to recover his stolen goods, or entertains a
doubt on the subject, as Sotus expresses it; for he is not obliged to
run the risk of losing any part of his property merely to save the life
of a robber. The same privilege extends even to clergymen.”[257] Such
extraordinary assurance! The law of Moses punishes those who kill a
thief when he does not threaten our lives, and the law of the Gospel,
according to you, will absolve them! What, fathers! has Jesus Christ
come to destroy the law, and not to fulfil it? “The civil judge,” says
Lessius, “would inflict punishment on those who should kill under
such circumstances; but no blame can be attached to the deed in
conscience.” Must we conclude, then, that the morality of Jesus
Christ is more sanguinary, and less the enemy of murder, than that
of Pagans, from whom our judges have borrowed their civil laws
which condemn that crime? Do Christians make more account of the
good things of this earth, and less account of human life, than
infidels and idolaters? On what principle do you proceed, fathers?
Assuredly not upon any law that ever was enacted either by God or
man—on nothing, indeed, but this extraordinary reasoning: “The
laws,” say you, “permit us to defend ourselves against robbers, and
to repel force by force; self-defence, therefore, being permitted, it
follows that murder, without which self-defence is often
impracticable, may be considered as permitted also.”
It is false, fathers, that because self-defence is allowed, murder
may be allowed also. This barbarous method of self-vindication lies
at the root of all your errors, and has been justly stigmatized by the
Faculty of Louvain, in their censure of the doctrine of your friend
Father Lamy, as “a murderous defence—defensio occisiva.” I
maintain that the laws recognize such a wide difference between
murder and self-defence, that in those very cases in which the latter
is sanctioned, they have made a provision against murder, when the
person is in no danger of his life. Read the words, fathers, as they
run in the same passage of Cujas: “It is lawful to repulse the person
who comes to invade our property; but we are not permitted to kill
him.” And again: “If any should threaten to strike us, and not to
deprive us of life, it is quite allowable to repulse him; but it is against
all law to put him to death.”
Who, then, has given you a right to say, as Molina, Reginald,
Filiutius, Escobar, Lessius, and others among you, have said, “that it
is lawful to kill the man who offers to strike us a blow?” or, “that it is
lawful to take the life of one who means to insult us, by the common
consent of all the casuists,” as Lessius says. By what authority do
you, who are mere private individuals, confer upon other private
individuals, not excepting clergymen, this right of killing and slaying?
And how dare you usurp the power of life and death, which belongs
essentially to none but God, and which is the most glorious mark of
sovereign authority? These are the points that demand explanation;
and yet you conceive that you have furnished a triumphant reply to
the whole, by simply remarking, in your thirteenth Imposture, “that
the value for which Molina permits us to kill a thief, who flies without
having done us any violence, is not so small as I have said, and that
it must be a much larger sum than six ducats!” How extremely silly!
Pray, fathers, where would you have the price to be fixed? At fifteen
or sixteen ducats? Do not suppose that this will produce any
abatement in my accusations. At all events, you cannot make it
exceed the value of a horse; for Lessius is clearly of opinion, “that
we may lawfully kill the thief that runs off with our horse.”[258] But I
must tell you, moreover, that I was perfectly correct when I said that
Molina estimates the value of the thief’s life at six ducats; and, if you
will not take it upon my word, we shall refer it to an umpire, to
whom you cannot object. The person whom I fix upon for this office
is your own Father Reginald, who, in his explanation of the same
passage of Molina (l. 28, n. 68), declares that “Molina there
DETERMINES the sum for which it is not allowable to kill at three, or
four, or five ducats.” And thus, fathers, I shall have Reginald in
addition to Molina, to bear me out.
It will be equally easy for me to refute your fourteenth Imposture,
touching Molina’s permission to “kill a thief who offers to rob us of a
crown.” This palpable fact is attested by Escobar, who tells us “that
Molina has regularly determined the sum for which it is lawful to
take away life, at one crown.”[259] And all you have to lay to my
charge in the fourteenth imposture is, that I have suppressed the
last words of this passage, namely, “that in this matter every one
ought to study the moderation of a just self-defence.” Why do you
not complain that Escobar has also omitted to mention these words?
But how little tact you have about you! You imagine that nobody
understands what you mean by self-defence. Don’t we know that it
is to employ “a murderous defence?” You would persuade us that
Molina meant to say, that if a person, in defending his crown, finds
himself in danger of his life, he is then at liberty to kill his assailant,
in self-preservation. If that were true, fathers, why should Molina say
in the same place, that “in this matter he was of a contrary
judgment from Carrer and Bald,” who give permission to kill in self-
preservation? I repeat, therefore, that his plain meaning is, that
provided the person can save his crown without killing the thief, he
ought not to kill him; but that, if he cannot secure his object without
shedding blood, even though he should run no risk of his own life,
as in the case of the robber being unarmed, he is permitted to take
up arms and kill the man, in order to save his crown; and in so
doing, according to him, the person does not transgress “the
moderation of a just defence.” To show you that I am in the right,
just allow him to explain himself: “One does not exceed the
moderation of a just defence,” says he, “when he takes up arms
against a thief who has none, or employs weapons which give him
the advantage over his assailant. I know there are some who are of
a contrary judgment; but I do not approve of their opinion, even in
the external tribunal.”[260]
Thus, fathers, it is unquestionable that your authors have given
permission to kill in defence of property and honor, though life
should be perfectly free from danger. And it is upon the same
principle that they authorize duelling, as I have shown by a great
variety of passages from their writings, to which you have made no
reply. You have animadverted in your writings only on a single
passage taken from Father Layman, who sanctions the above
practice, “when otherwise a person would be in danger of sacrificing
his fortune or his honor;” and here you accuse me with having
suppressed what he adds, “that such a case happens very rarely.”
You astonish me, fathers: these are really curious impostures you
charge me withal. You talk as if the question were, Whether that is a
rare case? when the real question is, If, in such a case, duelling is
lawful? These are two very different questions. Layman, in the
quality of a casuist, ought to judge whether duelling is lawful in the
case supposed; and he declares that it is. We can judge without his
assistance, whether the case be a rare one; and we can tell him that
it is a very ordinary one. Or, if you prefer the testimony of your good
friend Diana, he will tell you that “the case is exceedingly
common.”[261] But be it rare or not, and let it be granted that Layman
follows in this the example of Navarre, a circumstance on which you
lay so much stress, is it not shameful that he should consent to such
an opinion as that, to preserve a false honor, it is lawful in
conscience to accept of a challenge, in the face of the edicts of all
Christian states, and of all the canons of the Church, while, in
support of these diabolical maxims, you can produce neither laws,
nor canons, nor authorities from Scripture, or from the fathers, nor
the example of a single saint, nor, in short, anything but the
following impious syllogism: “Honor is more than life: it is allowable
to kill in defence of life; therefore it is allowable to kill in defence of
honor!” What, fathers! because the depravity of men disposes them
to prefer that factitious honor before the life which God hath given
them to be devoted to his service, must they be permitted to murder
one another for its preservation? To love that honor more than life,
is in itself a heinous evil; and yet this vicious passion, which, when
proposed as the end of our conduct, is enough to tarnish the holiest
of actions, is considered by you capable of sanctifying the most
criminal of them!
What a subversion of all principle is here, fathers! And who does
not see to what atrocious excesses it may lead? It is obvious,
indeed, that it will ultimately lead to the commission of murder for
the most trifling things imaginable, when one’s honor is considered
to be staked for their preservation—murder, I venture to say, even
for an apple! You might complain of me, fathers, for drawing
sanguinary inferences from your doctrine with a malicious intent,
were I not fortunately supported by the authority of the grave
Lessius, who makes the following observation, in number 68: “It is
not allowable to take life for an article of small value, such as for a
crown or for an apple—aut pro pomo—unless it would be deemed
dishonorable to lose it. In this case, one may recover the article, and
even, if necessary, kill the aggressor; for this is not so much
defending one’s property as retrieving one’s honor.” This is plain
speaking, fathers; and, just to crown your doctrine with a maxim
which includes all the rest, allow me to quote the following from
Father Hereau, who has taken it from Lessius: “The right of self-
defence extends to whatever is necessary to protect ourselves from
all injury.”
What strange consequences does this inhuman principle involve!
and how imperative is the obligation laid upon all, and especially
upon those in public stations, to set their face against it! Not the
general good alone, but their own personal interest should engage
them to see well to it; for the casuists of your school whom I have
cited in my letters, extend their permissions to kill far enough to
reach even them. Factious men, who dread the punishment of their
outrages, which never appear to them in a criminal light, easily
persuade themselves that they are the victims of violent oppression,
and will be led to believe at the same time, “that the right of self-
defence extends to whatever is necessary to protect themselves
from all injury.” And thus, relieved from contending against the
checks of conscience, which stifle the greater number of crimes at
their birth, their only anxiety will be to surmount external obstacles.
I shall say no more on this subject, fathers; nor shall I dwell on
the other murders, still more odious and important to governments,
which you sanction, and of which Lessius, in common with many
others of your authors, treats in the most unreserved manner.[262] It
was to be wished that these horrible maxims had never found their
way out of hell; and that the devil, who is their original author, had
never discovered men sufficiently devoted to his will to publish them
among Christians.[263]
From all that I have hitherto said, it is easy to judge what a
contrariety there is betwixt the licentiousness of your opinions and
the severity of civil laws, not even excepting those of heathens. How
much more apparent must the contrast be with ecclesiastical laws,
which must be incomparably more holy than any other, since it is the
Church alone that knows and possesses the true holiness!
Accordingly, this chaste spouse of the Son of God, who, in imitation
of her heavenly husband, can shed her own blood for others, but
never the blood of others for herself, entertains a horror at the crime
of murder altogether singular, and proportioned to the peculiar
illumination which God has vouchsafed to bestow upon her. She
views man, not simply as man, but as the image of the God whom
she adores. She feels for every one of the race a holy respect, which
imparts to him, in her eyes, a venerable character, as redeemed by
an infinite price, to be made the temple of the living God. And
therefore she considers the death of a man, slain without the
authority of his Maker, not as murder only, but as sacrilege, by which
she is deprived of one of her members; for whether he be a believer
or an unbeliever, she uniformly looks upon him, if not as one, at
least as capable of becoming one, of her own children.[264]
Such, fathers, are the holy reasons which, ever since the time that
God became man for the redemption of men, have rendered their
condition an object of such consequence to the Church, that she
uniformly punishes the crime of homicide, not only as destructive to
them, but as one of the grossest outrages that can possibly be
perpetrated against God. In proof of this I shall quote some
examples, not from the idea that all the severities to which I refer
ought to be kept up (for I am aware that the Church may alter the
arrangement of such exterior discipline), but to demonstrate her
immutable spirit upon this subject. The penances which she ordains
for murder may differ according to the diversity of the times, but no
change of time can ever effect an alteration of the horror with which
she regards the crime itself.
For a long time the Church refused to be reconciled, till the very
hour of death, to those who had been guilty of wilful murder, as
those are to whom you give your sanction. The celebrated Council of
Ancyra adjudged them to penance during their whole lifetime; and,
subsequently, the Church deemed it an act of sufficient indulgence
to reduce that term to a great many years. But, still more effectually
to deter Christians from wilful murder, she has visited with most
severe punishment even those acts which have been committed
through inadvertence, as may be seen in St. Basil, in St. Gregory of
Nyssen, and in the decretals of Popes Zachary and Alexander II. The
canons quoted by Isaac, bishop of Langres (tr. 2. 13), “ordain seven
years of penance for having killed another in self-defence.” And we
find St. Hildebert, bishop of Mans, replying to Yves de Chartres, “that
he was right in interdicting for life a priest who had, in self-defence,
killed a robber with a stone.”
After this, you cannot have the assurance to persist in saying that
your decisions are agreeable to the spirit or the canons of the
Church. I defy you to show one of them that permits us to kill solely
in defence of our property (for I speak not of cases in which one
may be called upon to defend his life—se suaqae liberando): your
own authors, and, among the rest, Father Lamy, confess that no
such canon can be found. “There is no authority,” he says, “human
or divine, which gives an express permission to kill a robber who
makes no resistance.” And yet this is what you permit most
expressly. I defy you to show one of them that permits us to kill in
vindication of honor, for a buffet, for an affront, or for a slander. I
defy you to show one of them that permits the killing of witnesses,
judges, or magistrates, whatever injustice we may apprehend from
them. The spirit of the church is diametrically opposite to these
seditious maxims, opening the door to insurrections to which the
mob is naturally prone enough already. She has invariably taught her
children that they ought not to render evil for evil; that they ought
to give place unto wrath; to make no resistance to violence; to give
unto every one his due—honor, tribute, submission; to obey
magistrates and superiors, even though they should be unjust,
because we ought always to respect in them the power of that God
who has placed them over us. She forbids them, still more strongly
than is done by the civil law, to take justice into their own hands;
and it is in her spirit that Christian kings decline doing so in cases of
high treason, and remit the criminals charged with this grave offence
into the hands of the judges, that they may be punished according
to the laws and the forms of justice, which in this matter exhibit a
contrast to your mode of management, so striking and complete that
it may well make you blush for shame.
As my discourse has taken this turn, I beg you to follow the
comparison which I shall now draw between the style in which you
would dispose of your enemies, and that in which the judges of the
land dispose of criminals. Everybody knows, fathers, that no private
individual has a right to demand the death of another individual; and
that though a man should have ruined us, maimed our body, burnt
our house, murdered our father, and was prepared, moreover, to
assassinate ourselves, or ruin our character, our private demand for
the death of that person would not be listened to in a court of
justice. Public officers have been appointed for that purpose, who
make the demand in the name of the king, or rather, I would say, in
the name of God. Now, do you conceive, fathers, that Christian
legislators have established this regulation out of mere show and
grimace? Is it not evident that their object was to harmonize the
laws of the state with those of the Church, and thus prevent the
external practice of justice from clashing with the sentiments which
all Christians are bound to cherish in their hearts? It is easy to see
how this, which forms the commencement of a civil process, must
stagger you; its subsequent procedure absolutely overwhelms you.
Suppose, then, fathers, that these official persons have demanded
the death of the man who has committed all the above mentioned
crimes, what is to be done next? Will they instantly plunge a dagger
in his breast? No, fathers; the life of man is too important to be thus
disposed of; they go to work with more decency; the laws have
committed it, not to all sorts of persons, but exclusively to the
judges, whose probity and competency have been duly tried. And is
one judge sufficient to condemn a man to death? No; it requires
seven at the very least; and of these seven there must not be one
who has been injured by the criminal, lest his judgment should be
warped or corrupted by passion. You are aware also, fathers, that
the more effectually to secure the purity of their minds, they devote
the hours of the morning to these functions. Such is the care taken
to prepare them for the solemn action of devoting a fellow-creature
to death; in performing which they occupy the place of God, whose
ministers they are, appointed to condemn such only as have incurred
his condemnation.
For the same reason, to act as faithful administrators of the divine
power of taking away human life, they are bound to form their
judgment solely according to the depositions of the witnesses, and
according to all the other forms prescribed to them; after which they
can pronounce conscientiously only according to law, and can judge
worthy of death those only whom the law condemns to that penalty.
And then, fathers, if the command of God obliges them to deliver
over to punishment the bodies of the unhappy culprits, the same
divine statute binds them to look after the interests of their guilty
souls, and binds them the more to this just because they are guilty;
so that they are not delivered up to execution till after they have
been afforded the means of providing for their consciences.[265] All
this is quite fair and innocent; and yet, such is the abhorrence of the
Church to blood, that she judges those to be incapable of
ministering at her altars who have borne any share in passing or
executing a sentence of death, accompanied though it be with these
religious circumstances; from which we may easily conceive what
idea the Church entertains of murder.
Such, then, being the manner in which human life is disposed of
by the legal forms of justice, let us now see how you dispose of it.
According to your modern system of legislation, there is but one
judge, and that judge is no other than the offended party. He is at
once the judge, the party, and the executioner. He himself demands
from himself the death of his enemy; he condemns him, he executes
him on the spot; and, without the least respect either for the soul or
the body of his brother, he murders and damns him for whom Jesus
Christ died; and all this for the sake of avoiding a blow on the cheek,
or a slander, or an offensive word, or some other offence of a similar
nature, for which, if a magistrate, in the exercise of legitimate
authority, were condemning any to die, he would himself be
impeached; for, in such cases, the laws are very far indeed from
condemning any to death. In one word, to crown the whole of this
extravagance, the person who kills his neighbor in this style, without
authority, and in the face of all law, contracts no sin and commits no
disorder, though he should be religious, and even a priest! Where
are we, fathers? Are these really religious, and priests, who talk in
this manner? Are they Christians? are they Turks? are they men? or
are they demons? And are these “the mysteries revealed by the
Lamb to his Society?” or are they not rather abominations suggested
by the Dragon to those who take part with him?
To come to the point with you, fathers, whom do you wish to be
taken for?—for the children of the Gospel, or for the enemies of the
Gospel? You must be ranged either on the one side or on the other;
for there is no medium here. “He that is not with Jesus Christ is
against him.” Into these two classes all mankind are divided. There
are, according to St. Augustine, two peoples and two worlds,
scattered abroad over the earth. There is the world of the children of
God, who form one body, of which Jesus Christ is the king and the
head; and there is the world at enmity with God, of which the devil
is the king and the head. Hence Jesus Christ is called the King and
God of the world, because he has everywhere his subjects and
worshippers; and hence the devil is also termed in Scripture the
prince of this world, and the god of this world, because he has
everywhere his agents and his slaves. Jesus Christ has imposed
upon the Church, which is his empire, such laws as he, in his eternal
wisdom, was pleased to ordain; and the devil has imposed on the
world, which is his kingdom, such laws as he chose to establish.
Jesus Christ has associated honor with suffering; the devil with not
suffering. Jesus Christ has told those who are smitten on the one
cheek to turn the other also; and the devil has told those who are
threatened with a buffet to kill the man that would do them such an
injury. Jesus Christ pronounces those happy who share in his
reproach; and the devil declares those to be unhappy who lie under
ignominy. Jesus Christ says, Woe unto you when men shall speak
well of you! and the devil says, Woe unto those of whom the world
does not speak with esteem!
Judge then, fathers, to which of these kingdoms you belong. You
have heard the language of the city of peace, the mystical
Jerusalem; and you have heard the language of the city of
confusion, which Scripture terms “the spiritual Sodom.” Which of
these two languages do you understand? which of them do you
speak? Those who are on the side of Jesus Christ have, as St. Paul
teaches us, the same mind which was also in him; and those who
are the children of the devil—ex patre diabolo—who has been a
murderer from the beginning, according to the saying of Jesus
Christ, follow the maxims of the devil. Let us hear, therefore, the
language of your school. I put this question to your doctors: When a
person has given me a blow on the cheek, ought I rather to submit
to the injury than kill the offender? or may I not kill the man in order
to escape the affront? Kill him by all means—it is quite lawful!
exclaim, in one breath, Lessius, Molina, Escobar, Reginald, Filiutius,
Baldelle, and other Jesuits. Is that the language of Jesus Christ? One
question more: Would I lose my honor by tolerating a box on the
ear, without killing the person that gave it? “Can there be a doubt,”
cries Escobar, “that so long as a man suffers another to live who has
given him a buffet, that man remains without honor?” Yes, fathers,
without that honor which the devil transfuses, from his own proud
spirit into that of his proud children. This is the honor which has ever
been the idol of worldly-minded men. For the preservation of this
false glory, of which the god of this world is the appropriate
dispenser, they sacrifice their lives by yielding to the madness of
duelling; their honor, by exposing themselves to ignominious
punishments; and their salvation, by involving themselves in the peril
of damnation—a peril which, according to the canons of the Church,
deprives them even of Christian burial. We have reason to thank
God, however, for having enlightened the mind of our monarch with
ideas much purer than those of your theology. His edicts bearing so
severely on this subject, have not made duelling a crime—they only
punish the crime which is inseparable from duelling. He has checked,
by the dread of his rigid justice, those who were not restrained by
the fear of the justice of God; and his piety has taught him that the
honor of Christians consists in their observance of the mandates of
Heaven and the rules of Christianity, and not in the pursuit of that
phantom which, airy and unsubstantial as it is, you hold to be a
legitimate apology for murder. Your murderous decisions being thus
universally detested, it is highly advisable that you should now
change your sentiments, if not from religious principle, at least from
motives of policy. Prevent, fathers, by a spontaneous condemnation
of these inhuman dogmas, the melancholy consequences which may
result from them, and for which you will be responsible. And to
impress your minds with a deeper horror at homicide, remember
that the first crime of fallen man was a murder, committed on the
person of the first holy man; that the greatest crime was a murder,
perpetrated on the person of the King of saints; and that of all
crimes, murder is the only one which involves in a common
destruction the Church and the state, nature and religion.
I have just seen the answer of your apologist to my Thirteenth

Letter; but if he has nothing better to produce in the shape of a
reply to that letter, which obviates the greater part of his objections,
he will not deserve a rejoinder. I am sorry to see him perpetually
digressing from his subject, to indulge in rancorous abuse both of
the living and the dead. But, in order to gain some credit to the
stories with which you have furnished him, you should not have
made him publicly disavow a fact so notorious as that of the buffet
of Compiègne.[266] Certain it is, fathers, from the deposition of the
injured party, that he received upon his cheek a blow from the hand
of a Jesuit; and all that your friends have been able to do for you
has been to raise a doubt whether he received the blow with the
back or the palm of the hand, and to discuss the question whether a
stroke on the cheek with the back of the hand can be properly
denominated a buffet. I know not to what tribunal it belongs to
decide this point; but shall content myself, in the mean time, with
believing that it was, to say the very least, a probable buffet. This
gets me off with a safe conscience.
LETTER XV.[267]
TO THE REVEREND FATHERS, THE JESUITS.

SHOWING THAT THE JESUITS FIRST EXCLUDE CALUMNY FROM THEIR
CATALOGUE OF CRIMES, AND THEN EMPLOY IT IN DENOUNCING THEIR
OPPONENTS.
November 25, 1656.

Reverend Fathers,—As your scurrilities are daily increasing, and as
you are employing them in the merciless abuse of all pious persons
opposed to your errors, I feel myself obliged, for their sake and that
of the Church, to bring out that grand secret of your policy, which I
promised to disclose some time ago, in order that all may know,
through means of your own maxims, what degree of credit is due to
your calumnious accusations.
I am aware that those who are not very well acquainted with you,
are at a great loss what to think on this subject, as they find
themselves under the painful necessity, either of believing the
incredible crimes with which you charge your opponents, or (what is
equally incredible) of setting you down as slanderers. “Indeed!” they
exclaim, “were these things not true, would clergymen publish them
to the world—would they debauch their consciences and damn
themselves by venting such libels?” Such is their way of reasoning,
and thus it is that the palpable proof of your falsifications coming
into collision with their opinion of your honesty, their minds hang in
a state of suspense between the evidence of truth which they
cannot gainsay, and the demands of charity which they would not
violate. It follows, that since their high esteem for you is the only
thing that prevents them from discrediting your calumnies, if we can
succeed in convincing them that you have quite a different idea of
calumny from that which they suppose you to have, and that you
actually believe that in blackening and defaming your adversaries
you are working out your own salvation, there can be little question
that the weight of truth will determine them immediately to pay no
regard to your accusations. This, fathers, will be the subject of the
present letter.
My design is, not simply to show that your writings are full of
calumnies: I mean to go a step beyond this. It is quite possible for a
person to say a number of false things believing them to be true;
but the character of a liar implies the intention to tell lies. Now I
undertake to prove, fathers, that it is your deliberate intention to tell
lies, and that it is both knowingly and purposely that you load your
opponents with crimes of which you know them to be innocent,
because you believe that you may do so without falling from a state
of grace. Though you doubtless know this point of your morality as
well as I do, this need not prevent me from telling you about it;
which I shall do, were it for no other purpose than to convince all
men of its existence, by showing them that I can maintain it to your
face, while you cannot have the assurance to disavow it, without
confirming, by that very disavowment, the charge which I bring
against you.
The doctrine to which I allude is so common in your schools, that
you have maintained it not only in your books, but, such is your
assurance, even in your public theses; as, for example, in those
delivered at Louvain in the year 1645, where it occurs in the
following terms: “What is it but a venial sin to calumniate and forge
false accusations to ruin the credit of those who speak evil of
us?”[268] So settled is this point among you, that if any one dare to
oppose it, you treat him as a blockhead and a hare-brained idiot.
Such was the way in which you treated Father Quiroga, the German
Capuchin, when he was so unfortunate as to impugn the doctrine.
The poor man was instantly attacked by Dicastille, one of your
fraternity; and the following is a specimen of the manner in which he
manages the dispute: “A certain rueful-visaged, bare-footed, cowled
friar—cucullatus gymnopoda—whom I do not choose to name, had
the boldness to denounce this opinion, among some women and
ignorant people, and to allege that it was scandalous and pernicious
against all good manners, hostile to the peace of states and
societies, and, in short, contrary to the judgment not only of all
Catholic doctors, but of all true Catholics. But in opposition to him I
maintained, as I do still, that calumny, when employed against a
calumniator, though it should be a falsehood, is not a mortal sin,
either against justice or charity: and to prove the point, I referred
him to the whole body of our fathers, and to whole universities,
exclusively composed of them, whom I had consulted on the
subject; and among others the reverend Father John Gans,
confessor to the emperor; the reverend Father Daniel Bastele,
confessor to the archduke Leopold; Father Henri, who was preceptor
to these two princes; all the public and ordinary professors of the
university of Vienna” (wholly composed of Jesuits); “all the
professors of the university of Gratz” (all Jesuits); “all the professors
of the university of Prague” (where Jesuits are the masters);—“from
all of whom I have in my possession approbations of my opinions,
written and signed with their own hands; besides having on my side
the reverend Father Panalossa, a Jesuit, preacher to the emperor
and the king of Spain; Father Pilliceroli, a Jesuit, and many others,
who had all judged this opinion to be probable, before our dispute
began.”[269] You perceive, fathers, that there are few of your opinions
which you have been at more pains to establish than the present, as
indeed there were few of them of which you stood more in need. For
this reason, doubtless, you have authenticated it so well, that the
casuists appeal to it as an indubitable principle. “There can be no
doubt,” says Caramuel, “that it is a probable opinion that we contract
no mortal sin by calumniating another, in order to preserve our own
reputation. For it is maintained by more than twenty grave doctors,
by Gaspard Hurtado, and Dicastille, Jesuits, &c.; so that, were this
doctrine not probable, it would be difficult to find any one such in
the whole compass of theology.”
Wretched indeed must that theology be, and rotten to the very
core, which, unless it has been decided to be safe in conscience to
defame our neighbor’s character to preserve our own, can hardly
boast of a safe decision on any other point! How natural is it,
fathers, that those who hold this principle should occasionally put it
in practice! The corrupt propensity of mankind leans so strongly in
that direction of itself, that the obstacle of conscience once being
removed, it would be folly to suppose that it will not burst forth with
all its native impetuosity. If you desire an example of this, Caramuel
will furnish you with one that occurs in the same passage: “This
maxim of Father Dicastille,” he says, “having been communicated by
a German countess to the daughters of the empress, the belief thus
impressed on their minds that calumny was only a venial sin, gave
rise in the course of a few days to such an immense number of false
and scandalous tales, that the whole court was thrown into a flame
and filled with alarm. It is easy, indeed, to conceive what a fine use
these ladies would make of the new light they had acquired. Matters
proceeded to such a length, that it was found necessary to call in
the assistance of a worthy Capuchin friar, a man of exemplary life,
called Father Quiroga” (the very man whom Dicastille rails at so
bitterly), “who assured them that the maxim was most pernicious,
especially among women, and was at the greatest pains to prevail
upon the empress to abolish the practice of it entirely.” We have no
reason, therefore, to be surprised at the bad effects of this doctrine;
on the contrary, the wonder would be, if it had failed to produce
them. Self-love is always ready enough to whisper in our ear, when
we are attacked, that we suffer wrongfully; and more particularly in
your case, fathers, whom vanity has blinded so egregiously as to
make you believe that to wound the honor of your Society, is to
wound that of the Church. There would have been good ground to
look on it as something miraculous, if you had not reduced this
maxim to practice. Those who do not know you are ready to say,
How could these good fathers slander their enemies, when they
cannot do so but at the expense of their own salvation? But if they
knew you better, the question would be, How could these good
fathers forego the advantage of decrying their enemies, when they
have it in their power to do so without hazarding their salvation? Let
none, therefore, henceforth be surprised to find the Jesuits
calumniators; they can exercise this vocation with a safe conscience;
there is no obstacle in heaven or on earth to prevent them. In virtue
of the credit they have acquired in the world, they can practise
defamation without dreading the justice of mortals; and, on the
strength of their self-assumed authority in matters of conscience,
they have invented maxims for enabling them to do it without any
fear of the justice of God.
This, fathers, is the fertile source of your base slanders. On this
principle was Father Brisacier led to scatter his calumnies about him,
with such zeal as to draw down on his head the censure of the late
Archbishop of Paris. Actuated by the same motives, Father D’Anjou
launched his invectives from the pulpit of the Church of St. Benedict
in Paris, on the 8th of March, 1655, against those honorable
gentlemen who were intrusted with the charitable funds raised for
the poor of Picardy and Champagne, to which they themselves had
largely contributed; and, uttering a base falsehood, calculated (if
your slanders had been considered worthy of any credit) to dry up
the stream of that charity, he had the assurance to say, “that he
knew, from good authority, that certain persons had diverted that
money from its proper use, to employ it against the Church and the
State;” a calumny which obliged the curate of the parish, who is a
doctor of the Sorbonne, to mount the pulpit the very next day, in
order to give it the lie direct. To the same source must be traced the
conduct of your Father Crasset, who preached calumny at such a
furious rate in Orleans that the archbishop of that place was under
the necessity of interdicting him as a public slanderer. In his
mandate, dated the 9th of September last, his lordship declares,
“That whereas he had been informed that Brother Jean Crasset,
priest of the Society of Jesus, had delivered from the pulpit a
discourse filled with falsehoods and calumnies against the
ecclesiastics of this city, falsely and maliciously charging them with
maintaining impious and heretical propositions, such as, That the
commandments of God are impracticable; that internal grace is

Full Arithmetic Optimization Techniques For Hardware and Software Design 1st Edition Ryan Kastner Ebook All Chapters

Uploaded by

Copyright:

Available Formats

Full Arithmetic Optimization Techniques For Hardware and Software Design 1st Edition Ryan Kastner Ebook All Chapters

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Full Arithmetic Optimization Techniques For Hardware and Software Design 1st Edition Ryan Kastner Ebook All Chapters

Uploaded by

Copyright:

Available Formats

Full download ebook at ebookname.

Arithmetic Optimization Techniques for Hardware

Download more ebook from https://ebookname.com

Embedded System Design A Unified Hardware Software

Computer Organization and Design The Hardware Software

Computer Architecture Software Aspects Coding and

Raspberry Pi Cookbook Software and Hardware Problems

RTL Hardware Design Using VHDL Coding for Efficiency

Embedded Controller Hardware Design 1st Edition Ken

Design for Motion Fundamentals and Techniques of Motion

Microcontrollers and Microcomputers Principles of

Ryan Kastner is an Associate Professor in the Department of Computer Science

Anup Hosangadi is an R&D Engineer in the Emulation Group at Cadence Design

Farzan Fallah is currently a visiting scholar at Stanford University, Stanford.

ANUP HOSANG ADI

Cambridge University Press

This publication is in copyright. Subject to statutory exception and to the

ISBN-13 978-0-511-71299-9 eBook (NetLibrary)

Cambridge University Press has no responsibility for the persistence or accuracy

List of abbreviations page vii

2 Use of polynomial expressions and linear systems 9

4.8 Resource binding 56

5 Fundamentals of digital arithmetic 68

7 Linear systems 126

ACU address calculation units

FFT finite Fourier transform

The purpose of the book is to provide an understanding of the design methodologies

requires a polynomial of degree 12. This requires a significant amount of

Many of the applications that we consider lie in the realm of embedded

Algorithmic Fixed to floating Error

Architectural Compiler frontend:

Software design flow

Figure 1.1 Embedded system design flow.

to fixed point representation. Though floating point representation provides

1.2 Salient features of this book

Chapter 2 illustrates the different applications that require arithmetic computa-

Chapter 6 presents algebraic optimization techniques for polynomial expres-

1.4 Target audience

[1] C. Shi and R.W. Brodersen, An automated floating-point to fixed-point conversion

2.1 Chapter overview

2.2 Approximation algorithms

Polynomial functions can be used to approximate any differentiable function.

and S7, respectively), the naı̈ve evaluation of this polynomial representation

Here, only three additions/subtractions, four variable multiplications, and one

2.3 Computer graphics

Computer graphics is a prime example of an application domain that uses

P ¼ zu4 þ 4avu3 þ 6bu2 v2 þ 4uv3 w þ qv4 : ð2:2Þ

P ¼ zu4 þ v ð4au3 þ v ð6bu2 þ v ð4uw þ qvÞÞÞ: ð2:4Þ

It uses 17 multiplications and 4 additions. The Horner form produces a good

It uses 13 multiplications and 4 additions. This implementation produces three

2.4 Digital signal processing (DSP)

DSP is an increasingly important form of computation found in applications

2.4.1 Finite impulse response (FIR) filters

h[L – 1] h[L – 2] h[L – 3] h[1] h[0]

additions. A fully parallel implementation uses O(L) constant multipliers, O(L)

2.4.2 Linear transforms

cos(0) cos(0) cos(0) cos(0) A A A A

cos(p/8) cos(3p/8) cos(5p/8) cos(7p/8) B C –C –B

cos(3p/8) cos(7p/8) cos(p/8) cos(5p/8) C –B B –C

2 x is replaced with either x þ x or x << 1. In most processors performing