100% found this document useful (1 vote)
47 views55 pages

Deep Learning in Computer Vision: Principles and Applications First Edition. Edition Mahmoud Hassaballah

Uploaded by

byeonmuziogh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
47 views55 pages

Deep Learning in Computer Vision: Principles and Applications First Edition. Edition Mahmoud Hassaballah

Uploaded by

byeonmuziogh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Download the Full Version of textbook for Fast Typing at textbookfull.

com

Deep learning in computer vision: principles and


applications First Edition. Edition Mahmoud
Hassaballah

https://textbookfull.com/product/deep-learning-in-computer-
vision-principles-and-applications-first-edition-edition-
mahmoud-hassaballah/

OR CLICK BUTTON

DOWNLOAD NOW

Download More textbook Instantly Today - Get Yours Now at textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Digital Media Steganography: Principles, Algorithms, and


Advances 1st Edition Mahmoud Hassaballah (Editor)

https://textbookfull.com/product/digital-media-steganography-
principles-algorithms-and-advances-1st-edition-mahmoud-hassaballah-
editor/
textboxfull.com

Deep Learning on Windows: Building Deep Learning Computer


Vision Systems on Microsoft Windows Thimira Amaratunga

https://textbookfull.com/product/deep-learning-on-windows-building-
deep-learning-computer-vision-systems-on-microsoft-windows-thimira-
amaratunga/
textboxfull.com

Deep Learning on Windows Building Deep Learning Computer


Vision Systems on Microsoft Windows 1st Edition Thimira
Amaratunga
https://textbookfull.com/product/deep-learning-on-windows-building-
deep-learning-computer-vision-systems-on-microsoft-windows-1st-
edition-thimira-amaratunga/
textboxfull.com

Computer Vision Using Deep Learning Neural Network


Architectures with Python and Keras 1st Edition Vaibhav
Verdhan
https://textbookfull.com/product/computer-vision-using-deep-learning-
neural-network-architectures-with-python-and-keras-1st-edition-
vaibhav-verdhan/
textboxfull.com
Deep Learning for Vision Systems 1st Edition Mohamed
Elgendy

https://textbookfull.com/product/deep-learning-for-vision-systems-1st-
edition-mohamed-elgendy/

textboxfull.com

Deep Learning for Vision Systems 1st Edition Mohamed


Elgendy

https://textbookfull.com/product/deep-learning-for-vision-systems-1st-
edition-mohamed-elgendy-2/

textboxfull.com

Deep Learning for Vision Systems 1st Edition Mohamed


Elgendy

https://textbookfull.com/product/deep-learning-for-vision-systems-1st-
edition-mohamed-elgendy-3/

textboxfull.com

Programming PyTorch for Deep Learning Creating and


Deploying Deep Learning Applications 1st Edition Ian
Pointer
https://textbookfull.com/product/programming-pytorch-for-deep-
learning-creating-and-deploying-deep-learning-applications-1st-
edition-ian-pointer/
textboxfull.com

Explainable and Interpretable Models in Computer Vision


and Machine Learning Hugo Jair Escalante

https://textbookfull.com/product/explainable-and-interpretable-models-
in-computer-vision-and-machine-learning-hugo-jair-escalante/

textboxfull.com
Deep Learning in
Computer Vision
Digital Imaging and Computer Vision Series
Series Editor
Rastislav Lukac
Foveon, Inc./Sigma Corporation San Jose, California, U.S.A.
Dermoscopy Image Analysis
by M. Emre Celebi, Teresa Mendonça, and Jorge S. Marques
Semantic Multimedia Analysis and Processing
by Evaggelos Spyrou, Dimitris Iakovidis, and Phivos Mylonas
Microarray Image and Data Analysis: Theory and Practice
by Luis Rueda
Perceptual Digital Imaging: Methods and Applications
by Rastislav Lukac
Image Restoration: Fundamentals and Advances
by Bahadir Kursat Gunturk and Xin Li
Image Processing and Analysis with Graphs: Theory and Practice
by Olivier Lézoray and Leo Grady
Visual Cryptography and Secret Image Sharing
by Stelvio Cimato and Ching-Nung Yang
Digital Imaging for Cultural Heritage Preservation: Analysis,
Restoration, and Reconstruction of Ancient Artworks
by Filippo Stanco, Sebastiano Battiato, and Giovanni Gallo
Computational Photography: Methods and Applications
by Rastislav Lukac
Super-Resolution Imaging
by Peyman Milanfar
Deep Learning in
Computer Vision
Principles and Applications

Edited by
Mahmoud Hassaballah and Ali Ismail Awad
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2020 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-138-54442-0 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have
been made to publish reliable data and information, but the author and publisher cannot assume responsibility
for the validity of all materials or the consequences of their use. The authors and publishers have attempted to
trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if
permission to publish in this form has not been obtained. If any copyright material has not been acknowledged
please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microflming, and recording, or in any information storage or retrieval system, with-
out written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive,
Danvers, MA 01923, 978-750-8400. CCC is a not-for-proft organization that provides licenses and registration
for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate
system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identifcation and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Names: Hassaballah, Mahmoud, editor. | Awad, Ali Ismail, editor.


Title: Deep learning in computer vision : principles and applications /
edited by M. Hassaballah and Ali Ismail Awad.
Description: First edition. | Boca Raton, FL : CRC Press/Taylor and
Francis, 2020. | Series: Digital imaging and computer vision | Includes
bibliographical references and index.
Identifers: LCCN 2019057832 (print) | LCCN 2019057833 (ebook) | ISBN
9781138544420 (hardback ; acid-free paper) | ISBN 9781351003827 (ebook)
Subjects: LCSH: Computer vision. | Machine learning.
Classifcation: LCC TA1634 .D437 2020 (print) | LCC TA1634 (ebook) | DDC
006.3/7--dc23
LC record available at https://lccn.loc.gov/2019057832
LC ebook record available at https://lccn.loc.gov/2019057833

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents
Foreword ..................................................................................................................vii
Preface.......................................................................................................................ix
Editors Bio ............................................................................................................. xiii
Contributors ............................................................................................................. xv

Chapter 1 Accelerating the CNN Inference on FPGAs........................................ 1


Kamel Abdelouahab, Maxime Pelcat, and François Berry

Chapter 2 Object Detection with Convolutional Neural Networks..................... 41


Kaidong Li, Wenchi Ma, Usman Sajid, Yuanwei Wu, and
Guanghui Wang

Chapter 3 Effcient Convolutional Neural Networks for Fire Detection in


Surveillance Applications .................................................................. 63
Khan Muhammad, Salman Khan, and Sung Wook Baik

Chapter 4 A Multi-biometric Face Recognition System Based on


Multimodal Deep Learning Representations..................................... 89
Alaa S. Al-Waisy, Shumoos Al-Fahdawi, and Rami Qahwaji

Chapter 5 Deep LSTM-Based Sequence Learning Approaches for Action


and Activity Recognition.................................................................. 127
Amin Ullah, Khan Muhammad, Tanveer Hussain,
Miyoung Lee, and Sung Wook Baik

Chapter 6 Deep Semantic Segmentation in Autonomous Driving ................... 151


Hazem Rashed, Senthil Yogamani, Ahmad El-Sallab,
Mahmoud Hassaballah, and Mohamed ElHelw

Chapter 7 Aerial Imagery Registration Using Deep Learning for


UAV Geolocalization ....................................................................... 183
Ahmed Nassar, and Mohamed ElHelw

Chapter 8 Applications of Deep Learning in Robot Vision.............................. 211


Javier Ruiz-del-Solar and Patricio Loncomilla

v
vi Contents

Chapter 9 Deep Convolutional Neural Networks: Foundations and


Applications in Medical Imaging..................................................... 233
Mahmoud Khaled Abd-Ellah, Ali Ismail Awad,
Ashraf A. M. Khalaf, and Hesham F. A. Hamed

Chapter 10 Lossless Full-Resolution Deep Learning Convolutional


Networks for Skin Lesion Boundary Segmentation......................... 261
Mohammed A. Al-masni, Mugahed A. Al-antari, and Tae-Seong Kim

Chapter 11 Skin Melanoma Classifcation Using Deep Convolutional


Neural Networks .............................................................................. 291
Khalid M. Hosny, Mohamed A. Kassem, and Mohamed M. Foaud

Index...................................................................................................................... 315
Foreword
Deep learning, while it has multiple defnitions in the literature, can be defned as
“inference of model parameters for decision making in a process mimicking the
understanding process in the human brain”; or, in short: “brain-like model iden-
tifcation”. We can say that deep learning is a way of data inference in machine
learning, and the two together are among the main tools of modern artifcial intel-
ligence. Novel technologies away from traditional academic research have fueled
R&D in convolutional neural networks (CNNs); companies like Google, Microsoft,
and Facebook ignited the “art” of data manipulation, and the term “deep learning”
became almost synonymous with decision making.
Various CNN structures have been introduced and invoked in many computer
vision-related applications, with greatest success in face recognition, autonomous
driving, and text processing. The reality is: deep learning is an art, not a science.
This state of affairs will remain until its developers develop the theory behind its
functionality, which would lead to “cracking its code” and explaining why it works,
and how it can be structured as a function of the information gained with data. In
fact, with deep learning, there is good and bad news. The good news is that the indus-
try—not necessarily academia—has adopted it and is pushing its envelope. The bad
news is that the industry does not share its secrets. Indeed, industries are never inter-
ested in procedural and textbook-style descriptions of knowledge.
This book, Deep Learning in Computer Vision: Principles and Applications—as
a journey in the progress made through deep learning by academia—confnes itself
to deep learning for computer vision, a domain that studies sensory information
used by computers for decision making, and has had its impacts and drawbacks for
nearly 60 years. Computer vision has been and continues to be a system: sensors,
computer, analysis, decision making, and action. This system takes various forms
and the fow of information within its components, not necessarily in tandem. The
linkages between computer vision and machine learning, and between it and arti-
fcial intelligence, are very fuzzy, as is the linkage between computer vision and
deep learning. Computer vision has moved forward, showing amazing progress in
its short history. During the sixties and seventies, computer vision dealt mainly with
capturing and interpreting optical data. In the eighties and nineties, geometric com-
puter vision added science (geometry plus algorithms) to computer vision. During
the frst decade of the new millennium, modern computing contributed to the evolu-
tion of object modeling using multimodality and multiple imaging. By the end of
that decade, a lot of data became available, and so the term “deep learning” crept
into computer vision, as it did into machine learning, artifcial intelligence, and other
domains.
This book shows that traditional applications in computer vision can be solved
through invoking deep learning. The applications addressed and described in the
eleven different chapters have been selected in order to demonstrate the capabilities
of deep learning algorithms to solve various issues in computer vision. The content
of this book has been organized such that each chapter can be read independently

vii
viii Foreword

of the others. Chapters of the book cover the following topics: accelerating the CNN
inference on feld-programmable gate arrays, fre detection in surveillance applica-
tions, face recognition, action and activity recognition, semantic segmentation for
autonomous driving, aerial imagery registration, robot vision, tumor detection, and
skin lesion segmentation as well as skin melanoma classifcation.
From the assortment of approaches and applications in the eleven chapters, the
common thread is that deep learning for identifcation of CNN provides accuracy
over traditional approaches. This accuracy is attributed to the fexibility of CNN
and the availability of large data to enable identifcation through the deep learning
strategy. I would expect that the content of this book to be welcomed worldwide by
graduate and postgraduate students and workers in computer vision, including prac-
titioners in academia and industry. Additionally, professionals who want to explore
the advances in concepts and implementation of deep learning algorithms applied to
computer vision may fnd in this book an excellent guide for such purpose. Finally,
I hope that readers would fnd the presented chapters in the book interesting and
inspiring to future research, from both theoretical and practical viewpoints, to spur
further advances in discovering the secrets of deep learning.

Prof Aly Farag, PhD, Life Fellow, IEEE, Fellow, IAPR


Professor of Electrical and Computer Engineering
University of Louisville, Kentucky
Preface
Simply put, computer vision is an interdisciplinary feld of artifcial intelligence that
aims to guide computers and machines toward understanding the contents of digital
data (i.e., images or video). According to computer vision achievements, the future
generation of computers may understand human actions, behaviors, and languages
similarly to humans, carry out some missions on their behalf, or even communicate
with them in an intelligent manner. One aspect of computer vision that makes it
such an interesting topic of study and active research feld is the amazing diversity
of daily-life applications such as pedestrian protection systems, autonomous driving,
biometric systems, the movie industry, driver assistance systems, video surveillance,
and robotics as well as medical diagnostics and other healthcare applications. For
instance, in healthcare, computer vision algorithms may assist healthcare profession-
als to precisely classify illnesses and cases; this can potentially save patients’ lives
through excluding inaccurate medical diagnoses and avoiding erroneous treatment.
With this wide variety of applications, there is a signifcant overlap between com-
puter vision and other felds such as machine vision and image processing. Scarcely
a month passes where we do not hear from the research and industry communities
with an announcement of some new technological breakthrough in the areas of intel-
ligent systems related to the computer vision feld.
With the recent rapid progress on deep convolutional neural networks, deep learn-
ing has achieved remarkable performance in various felds. In particular, it has brought
a revolution to the computer vision community, introducing non-traditional and eff-
cient solutions to several problems that had long remained unsolved. Due to this prom-
ising performance, it is gaining more and more attention and is being applied widely in
computer vision for several tasks such as object detection and recognition, object seg-
mentation, pedestrian detection, aerial imagery registration, video processing, scene
classifcation, autonomous driving, and robot localization as well as medical image-
related applications. If the phrase “deep learning for computer vision” is searched in
Google, millions of search results will be obtained. Under these circumstances, a book
entitled Deep Learning in Computer Vision that covers recent progress and achieve-
ments in utilizing deep learning for computer vision tasks will be extremely useful.
The purpose of this contributed volume is to fll the existing gap in the literature
for the applications of deep learning in computer vision and to provide a bird’s eye
view of recent state-of-the-art models designed for practical problems in computer
vision. The book presents a collection of eleven high-quality chapters written by
renowned experts in the feld. Each chapter provides the principles and fundamentals
of a specifc topic, introduces reviews of up-to-date techniques, presents outcomes,
and points out challenges and future directions. In each chapter, fgures, tables,
and examples are used to improve the presentation and analysis of covered topics.
Furthermore, bibliographic references are included in each chapter, providing a good
starting point for deeper research and further exploration of the topics considered in
this book. Further, this book is structured such that each chapter can be read inde-
pendently from the others as follows:

ix
x Preface

Chapter 1 presents a state-of-the-art of CNN inference accelerators over FPGAs.


Computational workloads, parallelism opportunities, and the involved memory
accesses are analyzed. At the level of neurons, optimizations of the convolutional and
fully connected layers are explained and the performances of the different methods
compared, while at the network level, approximate computing and data-path optimi-
zation methods are covered and state-of-the-art approaches compared. The methods
and tools investigated in this chapter represent the recent trends in FPGA CNN infer-
ence accelerators and will fuel future advances in effcient hardware deep learning.
Chapter 2 concentrates on object detection problem using deep CNN (DCNN): the
recent developments of several classical CNN-based object detectors are discussed.
These detectors signifcantly improve detection performance either through employ-
ing new architectures or through solving practical issues like degradation, gradi-
ent vanishing, and class imbalance. Detailed background information is provided to
show the progress and improvements of different models. Some evaluation results
and comparisons are reported on three datasets with distinctive characteristics.
Chapter 3 proposes three methods for fre detection using CNNs. The frst method
focuses on early fre detection with an adaptive prioritization mechanism for surveil-
lance cameras. The second CNN-assisted method improves fre detection accuracy with
a main focus on reducing false alarms. The third method uses an effcient deep CNN
for fre detection. For localization of fre regions, a feature map selection algorithm that
intelligently selects appropriate feature maps sensitive to fre areas is proposed.
Chapter 4 presents an accurate and real-time multi-biometric system for identi-
fying a person’s identity using a combination of two discriminative deep learning
approaches to address the problem of unconstrained face recognition: CNN and deep
belief network (DBN). The proposed system is tested on four large-scale challenging
datasets with high diversity in the facial expressions—SDUMLA-HMT, FRGC V
2.0, UFI, and LFW—and new state-of-the-art recognition rates on all the employed
datasets are achieved.
Chapter 5 introduces a study of the concept of sequence learning using RNN,
LSTM, and its variants such as multilayer LSTM and bidirectional LSTM for action
and activity recognition problems. The chapter concludes with major issues of
sequence learning for action and activity recognition and highlights recommenda-
tions for future research.
Chapter 6 discuses semantic segmentation in autonomous driving applications,
where it focuses on constructing effcient and simple architectures to demonstrate
the beneft of fow and depth augmentation to CNN-based semantic segmentation
networks. The impact of both motion and depth information on semantic segmenta-
tion is experimentally studied using four simple network architectures. Results of
experiments on two public datasets—Virtual-KITTI and CityScapes—show reason-
able improvement in overall accuracy.
Chapter 7 presents a method based on deep learning for geolocalizing drones
using only onboard cameras. A pipeline has been implemented that makes use of the
availability of satellite imagery and traditional computer vision feature detectors and
descriptors, along with renowned deep learning methods (semantic segmentation), to
be able to locate the aerial image captured from the drone within the satellite imag-
ery. The method enables the drone to be autonomously aware of its surroundings and
navigate without using GPS.
Preface xi

Chapter 8 is intended to be a guide for the developers of robot vision systems,


focusing on the practical aspects of the use of deep neural networks rather than on
theoretical issues.
The last three chapters are devoted to deep learning in medical applications.
Chapter 9 covers basic information about CNNs in medical applications. CNN
developments are discussed from different perspectives, specifcally, CNN design,
activation function, loss function, regularization, optimization, normalization, and
network depth. Also, a deep convolutional neural network (DCNN) is designed for
brain tumor detection using MRI images. The proposed DCNN architecture is eval-
uated on the RIDER dataset, achieving accurate detection accuracy within a time of
0.24 seconds per MRI image.
Chapter 10 discusses automatic segmentation of skin lesion boundaries from sur-
rounding tissue and presents a novel deep learning segmentation methodology via
full-resolution convolutional network (FrCN). Experimental results show the great
promise of the FrCN method compared to state-of-the-art deep learning segmenta-
tion approaches such as fully convolutional networks (FCN), U-Net, and SegNet
with overall segmentation.
Chapter 11 is about the automatic classifcation of color skin images, where a
highly accurate method is proposed for skin melanoma classifcation utilizing two
modifed deep convolutional neural networks and consisting of three main steps.
The proposed method is tested using the well-known MED-NODE and DermIS &
DermQuest datasets.
It is very necessary to mention here that the book is a small piece in the puzzle
of computer vision and its applications. We hope that our readers fnd the presented
chapters in the book interesting and that the chapters will inspire future research
both from theoretical and practical viewpoints to spur further advances in the com-
puter vision feld.
The editors would like to take this opportunity to express their sincere grati-
tude to the contributors for extending their wholehearted support in sharing some
of their latest results and fndings. Without their signifcant contribution, this
book could not have fulflled its mission. The reviewers deserve our thanks for
their constructive and timely input. Special profound thanks go to Prof Aly Farag,
Professor of Electrical and Computer Engineering, University of Louisville,
Kentucky for writing the Foreword for this book. Finally, the editors acknowledge
the efforts of the CRC Press Taylor & Francis for giving us the opportunity to edit
a book on deep learning for computer vision. In particular, we would like to thank
Dr Rastislav Lukac, the editor of the Digital Imaging and Computer Vision book
series, and Nora Konopka for initiating this project. Really, the editorial staff
at CRC Press has done a meticulous job, and working with them was a pleasant
experience.

Mahmoud Hassaballah
Qena, Egypt

Ali Ismail Awad


Luleå, Sweden
Editors Bio
Mahmoud Hassaballah was born in 1974, Qena, Egypt. He
received his BSc degree in Mathematics in 1997 and his MSc
degree in Computer Science in 2003, both from South Valley
University, Egypt, and his Doctor of Engineering (D Eng) in
computer science from Ehime University, Japan in 2011. He
was a visiting scholar with the department of computer &
communication science, Wakayama University, Japan in
2013 and GREAH laboratory, Le Havre Normandie
University, France in 2019. He is currently an associate professor of computer sci-
ence at the faculty of computers and information, South Valley University, Egypt. He
served as a reviewer for several journals such as IEEE Transactions on Image
Processing, IEEE Transactions on Fuzzy Systems, Pattern Recognition, Pattern
Recognition Letters, IET Image Processing, IET Computer Vision, IET Biometrics,
Journal of Real-Time Image Processing, and Journal of Electronic Imaging. He has
published over 50 research papers in refereed international journals and conferences.
His research interests include feature extraction, object detection/recognition, artif-
cial intelligence, biometrics, image processing, computer vision, machine learning,
and data hiding.

Ali Ismail Awad (SMIEEE, PhD, PhD, MSc, BSc) is cur-


rently an Associate Professor (Docent) with the Department
of Computer Science, Electrical, and Space Engineering,
Luleå University of Technology, Luleå, Sweden, where
he also serves as a Coordinator of the Master Programme
in Information Security. He is a Visiting Researcher with
the University of Plymouth, United Kingdom. He is also
an Associate Professor with the Electrical Engineering
Department, Faculty of Engineering, Al-Azhar University at Qena, Qena, Egypt. His
research interests include information security, Internet-of-Things security, image
analysis with applications in biometrics and medical imaging, and network security.
He has edited or co-edited fve books and authored or co-authored several journal
articles and conference papers in these areas. He is an Editorial Board Member of the
following journals: Future Generation Computer Systems, Computers & Security,
Internet of Things: Engineering Cyber Physical Human Systems, and Health
Information Science and Systems. Dr Awad is currently an IEEE senior member.

xiii
Contributors
Ahmad El Sallab Hesham F.A. Hamed
Valeo Company Egyptian Russian University
Cairo, Egypt Cairo, Egypt
and
Ahmed Nassar Minia University
IRISA Institute Minia, Egypt
Rennes, France
Javier Ruiz-Del-Solar
Alaa S. Al-Waisy University of Chile
University of Bradford Santiago, Chile
Bradford, UK
Kaidong Li
University of Kansas
Ali Ismail Awad
Kansas City, Kansas
Luleå University of Technology
Luleå, Sweden
Kamel Abdelouahab
and
Clermont Auvergne University
Al-Azhar University
Clermont-Ferrand, France
Qena, Egypt
Khalid M. Hosny
Amin Ullah Zagazig University
Sejong University Zagazig, Egypt
Seoul, South Korea
Khan Muhammad
Ashraf A. M. Khalaf Sejong University
Minia University Seoul, South Korea
Minia, Egypt
Mahmoud Hassaballah
François Berry South Valley University
University Clermont Auvergne Qena, Egypt
Clermont-Ferrand, France
Mahmoud Khaled Abd-Ellah
Guanghui Wang Al-Madina Higher Institute for
University of Kansas Engineering and Technology
Kansas City, Kansas Giza, Egypt

Hazem Rashed Maxime Pelcat


Valeo Company University of Rennes
Cairo, Egypt Rennes, France

xv
xvi Contributors

Miyoung Lee Salman Khan


Sejong University Sejong University
Seoul, South Korea Seoul, South Korea

Mohamed A. Kassem Senthil Yogamani


Kafr El Sheikh University Valeo Company
Kafr El Sheikh, Egypt Galway, Ireland

Mohamed Elhelw Shumoos Al-Fahdawi


Nile University University of Bradford
Giza, Egypt Bradford, UK

Sung Wook Baik


Mohamed M. Foaud
Sejong University
Zagazig University
Seoul, South Korea
Zagazig, Egypt
Tae-Seong Kim
Mohammed A. Al-Masni Kyung Hee University
Kyung Hee University Seoul, South Korea
Seoul, South Korea
and Tanveer Hussain
Yonsei University Sejong University
Seoul, South Korea Seoul, South Korea

Mugahed A. Al-Antari Usman Sajid


Kyung Hee University University of Kansas
Seoul, South Korea Kansas City, Kansas
and
Sana’a Community College Wenchi Ma
Sana’a, Republic of Yemen University of Kansas
Kansas City, Kansas
Patricio Loncomilla
University of Chile Yuanwei Wu
Santiago, Chile University of Kansas
Kansas City, Kansas
Rami Qahwaji
University of Bradford
Bradford, UK
1 Accelerating the CNN
Inference on FPGAs
Kamel Abdelouahab, Maxime Pelcat,
and François Berry

CONTENTS
1.1 Introduction ......................................................................................................2
1.2 Background on CNNs and Their Computational Workload ............................3
1.2.1 General Overview.................................................................................3
1.2.2 Inference versus Training ..................................................................... 3
1.2.3 Inference, Layers, and CNN Models ....................................................3
1.2.4 Workloads and Computations...............................................................6
1.2.4.1 Computational Workload .......................................................6
1.2.4.2 Parallelism in CNNs ..............................................................8
1.2.4.3 Memory Accesses ..................................................................9
1.2.4.4 Hardware, Libraries, and Frameworks ................................ 10
1.3 FPGA-Based Deep Learning.......................................................................... 11
1.4 Computational Transforms ............................................................................. 12
1.4.1 The im2col Transformation ................................................................ 13
1.4.2 Winograd Transform .......................................................................... 14
1.4.3 Fast Fourier Transform ....................................................................... 16
1.5 Data-Path Optimizations ................................................................................ 16
1.5.1 Systolic Arrays.................................................................................... 16
1.5.2 Loop Optimization in Spatial Architectures ...................................... 18
Loop Unrolling ................................................................................... 19
Loop Tiling .........................................................................................20
1.5.3 Design Space Exploration................................................................... 21
1.5.4 FPGA Implementations ...................................................................... 22
1.6 Approximate Computing of CNN Models ..................................................... 23
1.6.1 Approximate Arithmetic for CNNs.................................................... 23
1.6.1.1 Fixed-Point Arithmetic ........................................................ 23
1.6.1.2 Dynamic Fixed Point for CNNs...........................................28
1.6.1.3 FPGA Implementations ....................................................... 29
1.6.1.4 Extreme Quantization and Binary Networks....................... 29
1.6.2 Reduced Computations....................................................................... 30
1.6.2.1 Weight Pruning .................................................................... 31
1.6.2.2 Low Rank Approximation ................................................... 31
1.6.2.3 FPGA Implementations ....................................................... 32

1
2 Deep Learning in Computer Vision

1.7 Conclusions..................................................................................................... 32
Bibliography ............................................................................................................ 33

1.1 INTRODUCTION
The exponential growth of big data during the last decade motivates for innovative
methods to extract high semantic information from raw sensor data such as videos,
images, and speech sequences. Among the proposed methods, convolutional neural
networks (CNNs) [1] have become the de facto standard by delivering near-human
accuracy in many applications related to machine vision (e.g., classifcation [2],
detection [3], segmentation [4]) and speech recognition [5].
This performance comes at the price of a large computational cost as CNNs
require up to 38 GOPs to classify a single frame [6]. As a result, dedicated hard-
ware is required to accelerate their execution. Graphics processing units GPUs
are the most widely used platform to implement CNNs as they offer the best per-
formance in terms of pure computational throughput, reaching up 11 TFLOPs
[7]. Nevertheless, in terms of power consumption, feld-programmable gate array
(FPGA) solutions are known to be more energy effcient (vs. GPU). While GPU
implementations have demonstrated state-of-the-art computational performance,
CNN acceleration will soon be moving towards FPGAs for two reasons. First,
recent improvements in FPGA technology put FPGA performance within striking
distance of GPUs with a reported performance of 9.2 TFLOPs for the latter [8].
Second, recent trends in CNN development increase the sparsity of CNNs and
use extremely compact data types. These trends favor FPGA devices, which are
designed to handle irregular parallelism and custom data types. As a result, next-
generation CNN accelerators are expected to deliver up to 5.4× better computa-
tional throughput than GPUs [7].
As an infection point in the development of CNN accelerators might be near, we
conduct a survey on FPGA-based CNN accelerators. While a similar survey can be
found in [9], we focus in this chapter on the recent techniques that were not covered
in the previous works. In addition to this chapter, we refer the reader to the works
of Venieris et al. [10], which review the toolfows automating the CNN mapping
process, and to the works of Sze et al., which focus on ASICs for deep learning
acceleration.
The amount and diversity of research on the subject of CNN FPGA acceleration
within the last 3 years demonstrate the tremendous industrial and academic interest.
This chapter presents a state-of-the-art review of CNN inference accelerators over
FPGAs. The computational workloads, their parallelism, and the involved memory
accesses are analyzed. At the level of neurons, optimizations of the convolutional
and fully connected (FC) layers are explained and the performances of the differ-
ent methods compared. At the network level, approximate computing and data-path
optimization methods are covered and state-of-the-art approaches compared. The
methods and tools investigated in this survey represent the recent trends in FPGA
CNN inference accelerators and will fuel the future advances on effcient hardware
deep learning.
Accelerating the CNN Inference on FPGAs 3

1.2 BACKGROUND ON CNNS AND THEIR


COMPUTATIONAL WORKLOAD
In this frst section, we overview the main features of CNNs, mainly focusing on the
computations and parallelism patterns involved during their inference.

1.2.1 GENERAL OVERVIEW


Deep* CNNs are feed-forward†, sparsely connected‡ neural networks. A typical
CNN structure consists of a pipeline of layers. Each layer inputs a set of data, known
as a feature map (FM), and produces a new set of FMs with higher-level semantics.

1.2.2 INFERENCE VERSUS TRAINING


As typical machine learning algorithms, CNNs are deployed in two phases. First,
the training stage works on a known set of annotated data samples to create a model
with a modeling power (which semantics extrapolates to natural data outside the
training set). This phase implements the back-propagation algorithm [11], which
iteratively updates CNN parameters such as convolution weights to improve the pre-
dictive power of the model. A special case of CNN training is fne-tuning. When
fne-tuning a model, weights of a previously trained network are used to initialize the
parameters of a new training. These weights are then adjusted for a new constraint,
such as a different dataset or a reduced precision.
The second phase, known as inference, uses the learned model to classify new data
samples (i.e., inputs that were not previously seen by the model). In a typical setup, CNNs
are trained/fne-tuned only once, on large clusters of GPUs. By contrast, the inference
is implemented each time a new data sample has to be classifed. As a consequence,
the literature mostly focuses on accelerating the inference phase. As a result, our dis-
cussion overviews the main methods employed to accelerate the inference. Moreover,
since most of the CNN accelerators benchmark their performance on models trained for
image classifcation, we focus our chapter on this application. Nonetheless, the methods
detailed in this survey can be employed to accelerate CNNs for other applications such
object detection, image segmentation, and speech recognition.

1.2.3 INFERENCE, LAYERS, AND CNN MODELS


CNN inference refers to the feed-forward propagation of B input images across L
layers. This section details the computations involved in the major types of these
layers. A common practice is to manipulate layers, parameters, and FMs as multidi-
mensional arrays, as listed in Table 1.1. Note that when it will be relevant, the type
of the layer will be denoted with superscript, and the position of the layer will be
denoted with subscript.

* Includes a large number of layer, typically above three.


† The information fows from the neurons of a layer ˜ towards the neurons of a layer. ˜ + 1
‡ CNNs implement the weight sharing technique, applying a small number of weights across all the

input pixels (i.e., image convolution).


4 Deep Learning in Computer Vision

TABLE 1.1
Tensors Involved in the Inference of a Given Layer ˜ with Their Dimensions
X Input FMs B×C×H×W B Batch size (Number of input frames)
Y Output FMs B×N×V×U W/H/C Width/Height/Depth of Input FMs
Θ Learned Filters N×C×J×K U/V/N Width/Height/Depth of Output FMs
β Learned biases N K/J Horizontal/Vertical Kernel size

A convolutional layer (conv) carries out the feature extraction process by applying – as
illustrated in Figure 1.1 – a set of three-dimensional convolution flters Θconv to a set
of B input volumes Xconv. Each input volume has a depth C and can be a color image
(in the case of the frst conv layer), or an output generated by previous layers in the
network. Applying a three-dimensional flter to three-dimensional input results in
a 2D (FM). Thus, applying N three-dimensional flters in a layer results in a three-
dimensional output with a depth N.
In some CNN models, a learned offset βconv – called a bias – is added to processed
feature maps. However, this practice has been discarded in recent models [6]. The
computations involved in feed-forward propagation of conv layers are detailed in
Equation 1.1.

˜ {b, n, u, v} ˝[1, B ] × [1, N ] × [1,V ] × [1,U ]

Y conv[b, n, v, u] = b conv[ n]
C J K (1.1)
+ åååX
c=1 j=1 k=1
conv
[b, c, v + j, u + k ] × Qconv[ n, c, j, k]

One may note that applying a depth convolution to a 3D input boils down to applying
a mainstream 2D convolution to each of the 2D channels of the input, then, at each
point, summing the results across all the channels, as shown in Equation 1.2.

FIGURE 1.1 Feed-forward propagation in conv, act, and pool layers (batch size B = 1, bias
β omitted).
Accelerating the CNN Inference on FPGAs 5

˜n °[1, N ]

C
Y[ n] conv
=b conv
[n] + åconv2D ( X[c]
c=1
conv
,Q[ c]conv ) (1.2)

Each conv layer of a CNN is usually followed by an activation layer that applies a
nonlinear function to all the values of FMs. Early CNNs were trained with TanH
or Sigmoid functions, but recent models employ the rectifed linear unit (ReLU)
function, which grants faster training times and less computational complexity, as
highlighted in Krizhevsky et al. [12].

˜ {b, n, u, v} ˝[1, B ] × [1, N ] × [1,V ] × [1,U ]

Y act [b, n, h, w] = act(X act [b, n, h, w]) | act:=TanH, Sigmoid, ReLU… (1.3)

The convolutional and activation parts of a CNN are directly inspired by the
cells of visual cortex in neuroscience [13]. This is also the case with pooling
layers, which are periodically inserted in between successive conv layers. As
shown in Equation 1.4, pooling sub-samples each channel of the input FM by
selecting either the average, or, more commonly, the maximum of a given neigh-
borhood K. As a result, the dimensionality of an FM is reduced, as illustrated
in Figure 1.1.

˜ {b, n, u, v} ˝[1, B ] × [1, N ] × [1,V ] × [1,U ]

(
Y pool [b, n, v, u] = max X pool [b, n, v + p, u + q]
p,q˜[1:K ]
) (1.4)

When deployed for classifcation purposes, the CNN pipeline is often terminated
by FC layers. In contrast with convolutional layers, FC layers do not implement
weight sharing and involve as much weight as input data (i.e., W = K, H = J,U = V = 1).
Moreover, in a similar way as conv layers, a nonlinear function is applied to the
outputs of FC layers.

˜ {b, n} ˝[1, B ] × [1, N ]

C H W
Y [b, n] = b [ n] +
fc fc
åååX [b, c, h, w]× Q [n, c, h, w]
c=1 h=1 w=1
fc fc
(1.5)
6 Deep Learning in Computer Vision

The Softmax function is a generalization of the Sigmoid function, and “squashes”


a N-dimensional vector X to Sigmoid(X) where each output is in the range [0,1].
The Softmax function is used in various multi-class classifcation methods, espe-
cially in CNNs. In this case, the Softmax layer is placed at the end of the net-
work and the dimension of vector it operates on (i.e., N) represents the number of
classes in the considered dataset. Thus, the input of the Softmax is the data gener-
ated by the last fully connected layer, and the output is the probability predicted
for each class.

˜ {b, n} ˝[1, B ] × [1, N ]

exp(X[b, n]) (1.6)


Softmax(X[b, n]) =
°
N
exp(X[b, c])
c=1

Batch normalization was introduced [14] to speed up training by linearly shifting


and scaling the distribution of a given batch of inputs B to have zero mean and unit
variance. These layers fnd also their interest when implementing binary neural net-
works (BNNs) as they reduce the quantization error compared to an arbitrary input
distribution, as highlighted in Hubara et al. [15]. Equation 1.7 details the processing
of batch norm layers, where the mean μ and the variance σ are statistics collected
during the training, α and γ are parameters learned during the training, and ϵ is a
hyper-parameter set empirically for numerical stability purposes (i.e., avoiding divi-
sion by zero).

˜ {b, n, u, v} ˝[1, B ] × [1, N ] × [1,V ] × [1,U ]

X BN [b, n, u, v] − m (1.7)
Y BN [b, n, v, u] = g+a
s2 + ˜

1.2.4 WORKLOADS AND COMPUTATIONS


The accuracy of CNN models has been increasing since their breakthrough in 2012
[12]. However, this accuracy comes at a high computational cost. The main challenge
that faces CNN developers is to improve classifcation accuracy while maintain-
ing a tolerable computational workload. As shown in Table 1.2, this challenge was
successfully addressed by Inception [16] and ResNet models [17], with their use of
bottleneck 1 × 1 convolutions that reduce both model size and computations while
increasing depth and accuracy.

1.2.4.1 Computational Workload


As shown in Equations 1.1 and 1.5, the processing of CNN involves an intensive use
of Multiply Accumulate (MAC) operation. All these MAC operations take place at
conv and FC layers, while the remaining parts of network are element-wise trans-
formations that can be generally implemented with low-complexity computational
requirements.
Accelerating the CNN Inference on FPGAs 7

TABLE 1.2
Popular CNN Models with Their Computational Workload*
AlexNet GoogleNet VGG16 VGG19 ResNet101 ResNet-152
Model [12] [16] [6] [6] [17] [17]

Top1 err (%) 42.9% 31.3% 28.1% 27.3% 23.6% % 23.0%


Top5 err (%) 19.80% 10.07% 9.90% 9.00% 7.1% 6.7%
Lc 5 57 13 16 104 155

˜
Lc
conv 666 M 1.58 G 15.3 G 19.5 G 7.57 G 11.3 G

˜=1

˜
Lc 2.33 M 5.97 M 14.7 M 20 M 42.4 M 58 M
W˜conv
˜=1

Act ReLU
Pool 3 14 5 5 2 2
Lf 3 1 3 3 1 1

˜
Lf 58.6 M 1.02 M 124 M 124 M 2.05 M 2.05 M
C˜fc
˜=1

˜
Lf 58.6 M 1.02 M 124 M 124 M 2.05 M 2.05 M
W˜ fc
˜=1

C 724 M 1.58 G 15.5 G 19.6 G 7.57 G 11.3 G


W 61 M 6.99 M 138 M 144 M 44.4 M 60 M

* Accuracy Measured on Single-Crops of ImageNet Test-Set

In this chapter, the computational workload C of a given CNN corresponds to the


number of MACs it involves during inference*. The number of these MACs mainly
depends on the topology of the network, and more particularly on the number of
conv and FC layers and their dimensions. Thus, the computational workload can be
expressed as in Equation 1.8, where L c is the number of conv (fully connected) lay-
ers, and C˜conv (C˜fc ) is the number of MACs occurring on a given convolution (fully
connected) layer ˜.

Lc Lf

C= ˜C
˜=1
˜
conv
+ ˜C
˜=1
˜
fc
(1.8)

C˜conv = N ˜ × C˜ × J ˜ × K ˜ × U ˜ × V˜ (1.9)

C˜fc = N ˜ × C˜ × W˜ × H ˜ (1.10)

* Batch size is set to 1 for clarity purposes.


8 Deep Learning in Computer Vision

In a similar way, the number of weights, and consequently the size of a given CNN
model, can be expressed as follows:

Lc Lf

W= ˜W
˜=1
˜
conv
+ ˜W
˜=1
˜
fc
(1.11)

W˜conv = N ˜ × C˜ × J ˜ × K ˜ (1.12)

W˜fc = N ˜ × C˜ × W˜ × H ˜ (1.13)

For state-of-the-art CNN models, L c, N ˜ , and C˜ can be quite large. This makes
CNNs computationally and memory intensive, where for instance, the classifcation
of a single frame using the VGG19 network requires 19.5 billion MAC operations.
It can be observed in the same table that most of the MACs occur on the convolu-
tion parts, and consequently, 90% of the execution time of a typical inference is spent
on conv layers [18]. By contrast, FC layers marginalize most of the weights and thus
the size of a given CNN model.

1.2.4.2 Parallelism in CNNs


The high computational workload of CNNs makes their inference a challenging task,
especially on low-energy embedded devices. The key solution to this challenge is to
leverage on the extensive concurrency they exhibit. These parallelism opportunities
can be formalized as follows:

• Batch Parallelism: CNN implementations can simultaneously classify


multiple frames grouped as a batch B in order to reuse the flters in each
layer, minimizing the number the memory accesses. However, and as
shown in [10], batch parallelism quickly reaches its limits. This is due to the
fact that most of the memory transactions result from storing intermediate
results and not loading CNN parameters. Consequently, reusing the flters
only slightly impacts the overall processing time per image.
• Inter-layer Pipeline Parallelism: CNNs have a feed-forward hierarchical
structure consisting of a succession of data-dependent layers. These layers
can be executed in a pipelined fashion by launching layer (˜) before ending
the execution of layer (˜ −1). This pipelining costs latency but increases
throughput.

Moreover, the execution of the most computationally intensive parts (i.e., conv lay-
ers), exhibits the four following types of concurrency:

• Inter-FM Parallelism: Each two-dimensional plane of an FM can be pro-


cessed separately from the others, meaning that PN elements of Yconv can be
computed in parallel (0 < PN < N).
Accelerating the CNN Inference on FPGAs 9

• Intra-FM Parallelism: In a similar way, pixels of a single output FM plane


are data-independent and can thus be processed concurrently by evaluating
PV × PU values of Yconv[n] (0 < PV × PU < V × U).
• Inter-convolution Parallelism: Depth convolutions occurring in conv lay-
ers can be expressed as a sum of 2D convolutions, as shown in Equation
1.2. These 2D convolutions can be evaluated simultaneously by computing
concurrently Pc elements (0 < Pc < C).
• Intra-convolution Parallelism: The 2D convolutions involved in the pro-
cessing of conv layers can be implemented in a pipelined fashion such as
in [76]. In this case PJ × PK multiplications are implemented concurrently
(0 < PJ × PK < J × K).

1.2.4.3 Memory Accesses


As a consequence of the previous discussion, the inference of a CNN shows large vec-
torization opportunities that can be exploited by allocating multiple computational
resources to concurrently process multiple features. However, this parallelization
can not accelerate the execution of a CNN if no datacaching strategy is implemented.
In fact, memory bandwidth is often the bottleneck when processing CNNs.
In FC parts, the execution can be memory-bounded because of the high number
of weights that these layers contain, and consequently, the high number of memory
reads required.
This is expressed in Equation 1.14, where M˜fc refers to the number of memory
accesses occurring in an FC layer ˜. This number can be written as the sum of
memory accesses reading the inputs X fc˜ , the memory accesses reading the weights
(q ˜fc), and the number of memory accesses writing the results (Y˜fc ).

M˜fc = MemRd(X ˜fc ) + MemRd(q ˜fc ) + MemWr(Y˜fc ) (1.14)

= C˜ H ˜W˜ + N ˜C˜ H ˜W˜ + N ˜ (1.15)

˜ N ˜C˜ H ˜W˜ (1.16)

Note that the fully connected parts of state-of-the-art models involve large values
of N ˜ and C˜ , making the memory reading of weights the most impacting factor,
as formulated in Equation 1.16. In this context, batch parallelism can signifcantly
accelerate the execution of CNNs with a large number of FC layers.
In the conv parts, the high number of MAC operations results in a high number
of memory accesses, as each MAC requires at least 2 memory reads and 1 memory
write*. This number of memory accesses accumulates with the high dimensions of
data manipulated by conv layers, as shown in Equation 1.18. If all these accesses are
towards external memory (for instance, DRAM), throughput and energy consumption

* This is the best-case scenario of a fully pipelined MAC, where intermediate results do not need to be
loaded.
10 Deep Learning in Computer Vision

will be highly impacted, because DRAM access engenders high latency and energy
consumption, even more than the computation itself [21].

M˜conv = MemRd(X ˜conv ) + MemRd(q ˜conv ) + MemWr(Y˜conv ) (1.17)

= C˜ H ˜W˜ + N ˜C˜ J ˜ K ˜ + N ˜U ˜V˜ (1.18)

The number of these DRAM accesses, and thus latency and energy consumption, can
be reduced by implementing a memory-caching hierarchy using on-chip memories.
As discussed in the next sections, state-of-the-art CNN accelerators employ register
fles as well as several levels of caches. The former, being the fastest, is implemented
at the nearest of the computational capabilities. The latency and energy consumption
resulting from these caches is lower by several orders of magnitude than external
memory accesses, as pointed out in Sze et al. [22].

1.2.4.4 Hardware, Libraries, and Frameworks


In order to catch the parallelism of CNNs, dedicated hardware accelerators are
developed. Most of them are based on GPUs, which are known to perform well
on regular parallelism patterns thanks to simd and simt execution models, a dense
collection of foating-point computing elements that peak at 12 TFLOPs, and high
capacity/bandwidth on/off-chip memories [23]. To support these hardware accel-
erators, specialized libraries for deep learning are developed to provide the neces-
sary programming abstraction, such as CudNN on Nvidia GPU [24]. Built upon
these libraries, dedicated frameworks for deep learning are proposed to improve
productivity of conceiving, training, and deploying CNNs, such as Caffe [25] and
TensorFlow [26].
Beside GPU implementations, numerous FPGA accelerators for CNNs have been
proposed. FPGAs are fne-grained programmable devices that can catch the CNN
parallelism patterns with no memory bottleneck, thanks to the following:

1. A high density of hard-wired digital signal processor (DSP) blocks that are
able to achieve up to 20 (8 TFLOPs) TMACs [8].
2. A collection of in situ on-chip memories, located next to DSPs, that can be
exploited to signifcantly reduce the number of external memory accesses.

As a consequence, CNNs can beneft from a signifcant acceleration when running


on reconfgurable hardware. This has caused numerous research efforts to study
FPGA-based CNN acceleration, targeting both high performance computing (HPC)
applications [27] and embedded devices [28].
In the remaining parts of this chapter, we conduct a survey on methods and hard-
ware architectures to accelerate the execution of CNN on FPGA. The next section
lists the evaluation metrics used, then Sections 1.4 and 1.5 respectively study the
computational transforms and the data-path optimization involved in recent CNN
accelerators. Finally, the last section of this chapter details how approximate com-
puting is a key in FPGA-based deep learning, and overviews the main contributions
implementing these techniques.
Accelerating the CNN Inference on FPGAs 11

1.3 FPGA-BASED DEEP LEARNING


Accelerating a CNN on an FPGA-powered platform can be seen as an optimization
effort that focuses on one or several of the following criteria:

• Computational Throughput (T ): A large number of the works studied


in this chapter focus on reducing the CNN execution times on the FPGA
(i.e., the computation latency), by improving the computational throughput
of the accelerator. This throughput is usually expressed as the number of
MACs an accelerator performs per second. While this metric is relevant in
the case of HPC workloads, we prefer to report the throughput as the num-
ber of frames an accelerator processes per second (fps), which better suits
the embedded vision context. The two metrics can be directly related using
Equation 1.19, where C is defned in Equation 1.8, and refers to the number
of computations a CNN involve in order to process a single frame:

T (MACS)
T (FPS) = (1.19)
C(MAC)

• Classifcation/Detection Perf. (A ) : Another way to reduce CNN execution


times is to trade some of their modeling performance in favor of faster exe-
cution timings. For this reason, the classifcation and detection metrics are
reported, especially when dealing with approximate computing methods.
Classifcation performance is usually reported as top-1 and top-5 accura-
cies, and detection performance is reported using the mAP50 and mAP75
metrics.
• Energy and Power Consumption (P ): Numerous FPGA-based accelera-
tion methods can be categorized as either latency-driven or energy-driven.
While the former focus on improving the computational throughput, the
latter considers the power consumption of the accelerator, reported in watts.
Alternatively, numerous latency-driven accelerators can be ported to low-
power-range FPGAs and perform well under strict power consumption
requirements.
• Resource Utilization (R ): When it comes to FPGA acceleration, the utili-
zation of the available resources (lut, DSP blocks, sram blocks) is always
considered. Note that the resource utilization can be correlated to the power
consumption*, but improving the ratio between the two is a technological
problem that clearly exceeds the scope of this chapter. For this reason, both
power consumption and resources utilization metrics will be reported when
available.

An FPGA implementation of a CNN has to satisfy to the former requirements. In this


perspective, the literature provides three main approaches to address the problem

* At a similar number of memory accesses. These accesses typically play the most dominant role in the
power consumption of an accelerator.
12 Deep Learning in Computer Vision

FIGURE 1.2 Main approaches to accelerate CNN inference on FPGAs.

of FPGA-based deep learning. These approaches mainly consists of computational


transforms, data-path optimization, and approximate computing techniques, as illus-
trated in Figure 1.2.

1.4 COMPUTATIONAL TRANSFORMS


In order to accelerate the execution of conv and FC layers, numerous implementa-
tions rely on computational transforms. These transforms, which operate on the FM
and weight arrays, aim at vectorizing the implementations and reducing the number
of operations occurring during inference.
Three main transforms can be distinguished. The im2col method reshapes the
feature and weight arrays in a way to transform depth convolutions into matrix mul-
tiplications. The FFT method operates on the frequency domain, transforming con-
volutions into multiplications. Finally, in Winograd fltering, convolutions boil down
to element-wise matrix multiplications thanks to a tiling and a linear transformation
of data.
These computational transforms mainly appear in temporal architectures and are
implemented by means of variety of linear algebra libraries such OpenBLAS for
CPUs* or cuBLAS for GPUs†. Besides this, various implementations make use of
these transforms to effciently map CNNs on FPGAs.
This section discusses the three former methods, highlighting their use-cases and
computational improvements. For a better understanding, we recall that for each
layer ˜:

• The input feature map is represented as four-dimensional array X, in which


the dimensions B × C × H × W respectively refer to the batch size, the num-
ber of input channels, the height, and the width.

* https://www.openblas.net/
† https://developer.nvidia.com/cublas
Other documents randomly have
different content
Humming in misery “Non è ——”
He thinks not of the west so brightly ——
Nor listens to the faint and distant ——
But dreams of the false fair to whom is ——
The wo which never, never, will take ——.

Answer

367.
Convert the following into a couplet, perfect in rhyme and rhythm,
without adding or omitting a single letter:

“O Deborah, Deborah! wo unto thee


For thou art as deaf as a post.”

Answer

368.
One and the same word of two syllables, answers each of the
following triplets:
I. MY FIRST springs in the mountains;
MY SECOND springs out of the mountains;
MY WHOLE comes with a spring over the mountains.

II. MY FIRST runs up the trees;


MY SECOND runs past the trees;
MY WHOLE spreads over the trees.

III. MY FIRST runs on two feet;


MY SECOND runs without feet;
MY WHOLE just glides away.

IV. To catch MY FIRST, men march after it;


To capture MY SECOND, they march over it;
To possess MY WHOLE, they go through a march before it.

Answer

369.

Said the Moon to the Sun:


“Is the daylight begun?”
Said the Sun to the Moon:
“Not a moment too soon.

You’re a full Moon,” said he;


She replied, with a frown,
“Well! I never did see
So uncivil a clown!”

Query: Why was the Moon so angry?


Answer

370.

It is as high as all the stars,


No well was ever sunk so low;
It is in age five thousand years,
It was not born an hour ago.

It is as wet as water is;


No red-hot iron e’er was drier;
As dark as night, as cold as ice,
Shines like the sun, and burns like fire.

No soul, nor body to consume—


No fox more cunning, dunce more dull;
’Tis not on earth, ’tis in this room,
Hard as a stone, and soft as wool.

’Tis of no color, but of snow,


Outside and inside black as ink;
All red, all yellow, green and blue—
This moment you upon it think.

In every noise, this strikes your ear,


’Twill soon expire, ’twill ne’er decay;
Does always in the light appear,
And yet was never seen by day.

Than the whole earth it larger is,


Yet, than a small pin’s point, ’tis less;
I’ll tell you ten times what it is,
Yet after all, you shall not guess!

’Tis in your mouth, ’twas never nigh—


Where’er you look, you see it still;
’Twill make you laugh, ’twill make you cry;
You feel it plain, touch what you will.

Answer

371.

My FIRST, so faithful, fond and true,


Will ne’er forsake or injure you;
My SECOND, coming from the street
You often trample under feet;
My THIRD you sleep on every night,
Serene and calm, without affright;
My WHOLE is what you should not be
When talking with your friends or me.

Answer

372.
My FIRST is a little river in England that gave name to a
celebrated university; my SECOND is always near; my THIRD sounds
like several large bodies of water; and my WHOLE is the name of a
Persian monarch, the neighing of whose horse gave him a kingdom
and a crown.
Answer

373.

A horse in the midst of a meadow suppose,


Made fast to a stake by a line from his nose;
How long must this line be, that, feeding all ’round,
Permits him to graze just an acre of ground?

Answer

374.

FIRST, A house where man and beast


Find themselves at home;
SECOND, Greatest have and least
Wheresoe’er you roam;
THIRD, A pronoun, meaning many,
(You must add an L);
ALL, The manner of our meeting
When you ring my bell.

Answer

375.
A DINNER PARTY.
THE GUESTS,

(Who are chiefly Anachronisms and other Incongruities.)


The First: Escaped his foes by having his horse shod backward.
Second: Surnamed, The Wizard of the North.
3d: Dissolved pearls in wine; “herself being dissolved in love.”
4th: Was first tutor to Alexander the Great.
5th: Said “There are no longer Pyrenees.”
6th: The Puritan Poet.
7th: The Locksmith King.
8th: The woman “who drank up her husband.”
9th: The Architect of St. Peter’s, Rome.
10th: The Miner King.
11th: Surnamed The King Maker.
12th: The woman who married the murderer of her husband, and
of her husband’s father.
13th: The Architect of St. Paul’s, London.
14th: The man who spoke fifty-eight languages; whom Byron
called “a Walking Polyglot.”
15th: A death-note, and a father’s pride.
16th: The Bard of Ayrshire.
17th: The Knight “without fear, and without reproach.”
18th: Refused, because he dared not accept, the crown of
England.
19th: Whose vile maxim was “every man has his own price.”
20th: The king who had an emperor for his foot-stool.
21st: The conqueror of the conqueror of Napoleon.
22d: The inventor of gunpowder.
23d: The king who entered the enemy’s camp, disguised as a
harper.
24th: The greatest English navigator of the eighteenth century.
25th: The inventor of the art of printing.
26th: Whom Napoleon called “the bravest of the brave.”
27th: Who first discovered that the earth is round.
28th: The diplomatic conqueror of Napoleon.
29th: The inventor of the reflecting telescope.
30th: The conqueror of Pharsalia.
31st: The inventor of the safety lamp.
32d: First introduced tobacco into England.
33d: Discovered the Antarctic Continent.
34th: The present poet laureate of England.
35th: His immediate predecessor.
36th: The first of the line.
37th: Surnamed “the Madman of the North.”
38th: The young prince who carried a king captive to England.
39th: First sailed around the world.
40th: Said “language was given us to enable us to conceal our
thoughts.”
41st: The Father of History.

DISHES, RELISHES, DESSERT.


1: Natural caskets of valuable gems.
2: Material and immaterial.
3: The possessive case of a pronoun and an ornament.
4: A sign of the zodiac, (pluralized).
5: One-third of Cesar’s celebrated letter, and the centre of the
solar system.
6: Where Charles XII. went after the battle of Pultowa.
7: Whose English namesake Pope called “the brightest, wisest,
meanest of mankind.”
8: A celebrated English essayist.
9: Formerly a workman’s implement.
10: The ornamental part of the head.
11: An island in Lake Ontario.
12: Timber, and the herald of the morning.
13: A share in a rocky pathway.
14: The unruly member.
15: The earth, and a useful article.
16: An iron vessel, and eight ciphers.
17: A letter placed before what sufferers long for.
18: Like values, and odd ends.
19: A preposition, a piece of furniture, and a vowel, (pluralized).
20: An insect, followed by a letter, (pluralized).
21: The employment of some women, and the dread of all.
22: A kind of carriage, and a period of time.
23: A net for the head, an organ of sense, an emblem of beauty.
24: By adding two letters, you’ll have an Eastern conqueror.
25: Five-sevenths of a name not wholly unconnected with Bleak
House and Borrioboola Gha.
26: An underground room, and a vowel.
27: Skill, part of a needle, and to suffocate, (pluralized).
28: Antics.
29: An intimation burdens.
30: What if it should lose its savor?
31: Where you live a contented life; a hotel, and a vowel.
32: The staff of life.
33: What England will never become.
34: Scourges.
35: Running streams.
36: A domestic fowl, and the fruit of shrubs.
37: Married people.
38: A Holland prince serene, (pluralized).
39: To waste away, and Eve’s temptation, (pluralized).
40: Four-fifths of a month, and a dwelling, (pluralized).
41: Busybodies.
42: What Jeremiah saw in a vision.
43: Very old monkeys.
44: Approach convulsions.
45: Small blocks for holding bolts.
Answer
UNGUESSED RIDDLES.
As, on Louis Gaylord Clarke’s authority, “no museum is complete
without the club that killed Captain Cook”—he had seen it in six—so
no collection of riddles can be considered even presentable without
the famous enigma so often republished, and always with the
promise of “£50 reward for a solution.” It was first printed in the
Gentlemen’s Magazine, London, in March, 1757.
The compiler of this little book has no hope of winning the prize,
and leaves the lists open to her readers, with a hope that some one
of them may succeed in “guessing” not only this, but the next riddle,
of whose true answer she has not the faintest idea.

The noblest object in the works of art;


The brightest scenes which Nature can impart;
The well-known signal in the time of peace;
The point essential in a tenant’s lease;
The farmer’s comfort as he drives the plough;
A soldier’s duty, and a lover’s vow;
A contract made before the nuptial tie;
A blessing riches never can supply;
A spot that adds new charms to pretty faces;
An engine used in fundamental cases;
A planet seen between the earth and sun;
A prize that merit never yet has won;
A loss which prudence seldom can retrieve;
The death of Judas, and the fall of Eve;
A part between the ancle and the knee;
A Papist’s toast, and a physician’s fee;
A wife’s ambition, and a parson’s dues;
A miser’s idol, and the badge of Jews.
If now, your happy genius can divine
The corresponding word in every line,
By the first letter plainly may be found
An ancient city that is much renowned.
The other unguessed, if not unguessable, riddle claims to come
from Cambridge, and is as follows:

A Headless man had a letter to write;


It was read by one who had lost his sight;
The Dumb repeated it, word for word;
And he was Deaf who listened and heard.

(See Key.)

QUESTIONS NOT TO BE ANSWERED UNTIL THE


WORLD IS WISER.
Considering how useful the ocean is to mankind, are poets
justified in calling it “a waste of waters”?
How can we catch soft water when it is raining hard?
Where is the chair that “Verbum sat” in?
How does it happen that Fast days are always provokingly slow
days?
How is it that a storm looks heavy when it keeps lightening? And
the darker it grows, the more it lightens?
When it is said of a man that “he never forgets himself,” are we to
understand that his conduct is absolute perfection, or that it is the
perfection of selfishness?

PARADOXES.
1st. Polus instructed Ctesiphon in the art of pleading. Teacher
and pupil agreed that the tuition-fee should be paid when the latter
should win his first case. Some time having gone by, and the young
man being still without case or client, Polus, in despair of his fee,
brought the matter before the Court, each party pleading his own
cause. Polus spoke first, as follows:
“It is indifferent to me how the Court may decide this case. For, if
the decision be in my favor, I recover my fee by virtue of the
judgment; but, if my opponent wins the case, this being his first, I
obtain my fee according to the contract.”
Ctesiphon, being called on for his defense, said:
“The decision of the Court is indifferent to me. For, if in my favor, I
am thereby released from my debt to Polus. But, if I lose the case,
the fee cannot be demanded, according to our contract.”
2d. A certain king once built a bridge, and decreed that all
persons about to cross it, should be interrogated as to their
destination. If they told the truth they should be permitted to pass
unharmed; but, if they answered falsely, they should be hanged on a
gallows erected at the centre of the bridge. One day a man, about to
cross, was asked the usual question, and replied:
“I am going to be hanged on that gallows!”
Now, if they hanged him, he had told the truth, and ought to have
escaped; but, if they did not hang him, he had “answered falsely,”
and ought to have suffered the penalty of the law.
PART II.

FANCY TITLES FOR BOOKS.


Furnished by Thomas Hood for a blind door in the Library at
Chatsworth, for his friend the Duke of Devonshire.
Percy Vere. In Forty Volumes.
Dante’s Inferno; or Descriptions of Van Demon’s Land.
Ye Devyle on Two Styx: (black letter).
Lamb’s Recollections of Suet.
Lamb on the Death of Wolfe.
Plurality of Livings: with Regard to the Common Cat.
Boyle on Steam.
Blaine on Equestrian Burglary; or the Breaking-in of Horses.
John Knox on Death’s Door.
Peel on Bell’s System.
Life of Jack Ketch, with Cuts of his own Execution.
Cursory Remarks upon Swearing.
Cook’s Specimens of the Sandwich Tongue.
Recollections of Banister. By Lord Stair.
On the Affinity of the Death-Watch and Sheep-Tick.
Malthus’ Attacks of Infantry.
McAdam’s Views of Rhodes.
The Life of Zimmermann. By Himself.
Pygmalion. By Lord Bacon.
Rules of Punctuation. By a Thoroughbred Pointer.
Chronological Account of the Date Tree.
Kosciusko on the Right of the Poles to Stick up for Themselves.
Prize Poems. In Blank Verse.
Shelley’s Conchology.
Chantry on the Sculpture of the Chipaway Indians.
The Scottish Boccaccio. By D. Cameron.
Hoyle on the Game Laws.
Johnson’s Contradictionary.

When Hood and his family were living at Ostend for economy’s
sake, and with the same motive Mrs. Hood was doing her own work,
as we phrase it, he wrote to a friend in England: “Jane is becoming
an excellent cook and housemaid, and I intend to raise her wages.
She had nothing a week before, and now I mean to double it.”

It has been estimated that of all possible or impossible ways of


earning an honest livelihood, the most arduous, and at the same
time the way which would secure the greatest good to the greatest
number, would be to go around, cold nights, and get into bed for
people! To this might be added, going around cold mornings and
getting up for people; and, most useful and most onerous of all,
going around among undecided people and making up their minds.

In these days of universal condensation—of condensed milk,


condensed meats, condensed news—perhaps no achievement of
that kind ought to surprise us; but it must be acknowledged that
Thackeray’s condensing feat was the most extraordinary on record.
To compress “The Sorrows of Werther”—that three volumed novel: a
book of size—and tears, full of pathos and prettiness, of devotion
and desperation—into four stanzas that tell the whole story, was a
triumph of art which—which it is very possible GOETHE would admire
less than we do.

Werther had a love for Charlotte


Such as words can never utter.
Would you know how first he met her?
She was cutting bread and butter.

Charlotte was a married lady,


And a moral man was Werther,
And, for all the wealth of Indies,
Would do nothing for to hurt her.

So he sighed, and pined, and ogled,


And his passion boiled and bubbled,
Till he blew his silly brains out,
And no more was by it troubled.

Charlotte, when she saw his body


Borne before her on a shutter,
Like a well conducted person,
Went on cutting bread and butter.

Theodore Hook was celebrated not more for his marvelous


readiness in rhyming than for the quality of the rhymes themselves.
In his hands the English language seemed to have no choice: plain
prose appeared impossible. Motley was the only wear; fantastic
verse the only method of expression. No less does he press into his
service phrases from the languages, as in the curious verses which
follow, in praise of

CLUBS.
If any man loves comfort and
Has little cash to buy it, he
Should get into a crowded club—
A most select society!

While solitude and mutton cutlets


Serve infelix uxor, he
May have his club (like Hercules),
And revel there in luxury.

Here’s first the Athenæum club,


So wise, there’s not a man of it
That has not sense enough for six;
(In fact, that is the plan of it).

The very waiters answer you


With eloquence Socratical,
And always lay the knives and forks
In order mathematical.

The Union Club is quite superb;


Its best apartment daily is
The lounge of lawyers, doctors, beaux,
Merchants, cum multis aliis.

The Travellers are in Pall Mall,


And smoke cigars so cozily,
And dream they climb the highest Alps,
Or rove the plains of Moselai.

These are the stages which all men


Propose to play their parts upon;
For clubs are what the Londoners
Have clearly set their hearts upon.

OTHER WORLDS.
Mr. Mortimer Collins indulges in sundry very odd speculations
concerning them.

Other worlds! Those planets evermore


In their golden orbits swiftly glide on;
From quick Hermes by the solar shore,
To remote Poseidon.

Are they like this world? The glory shed


From the ruddy dawn’s unfading portals?
Does it fall on regions tenanted
By a race of mortals?

Are there merry maidens, wicked-eyed,


Peeping slyly through the cottage lattice?
Have they vintage bearing countries wide?
Have they oyster patties?

Does a mighty ocean roar and break


On dark rocks and sandy shores fantastic?
Have they any Darwins there, to make
Theories elastic?

Does their weather change? November fog,


Weeping April, March with many a raw gust?
And do thunder and demented dog
Come to them in August?

Nineteenth century science should unravel


All these queries, but has somehow missed ’em.
When will it be possible to travel
Through the solar system?

STILTS.

Behold the mansion reared by dædal Jack,


See the malt stored in many a plethoric sack,
In the proud cirque of Ivan’s bivouac.
Mark how the Rat’s felonious fangs invade
The golden stores in John’s pavilion laid.
Anon, with velvet foot and Tarquin strides,
Subtle Grimalkin to his quarry glides—
Grimalkin grim, that slew the fierce rodent,
Whose tooth insidious Johann’s sackcloth rent.
Lo! now the deep mouthed canine foe’s assault,
That vexed the avenger of the stolen malt,
Stored in the hallowed precincts of the hall,
That rose complete at Jack’s creative call.
Here stalks the impetuous Cow with crumpled horn,
Whereon the exacerbating hound was torn,
Who bayed the feline slaughter beast that slew
The Rat predacious, whose keen fangs ran through
The textile fibres that involved the grain,
That lay in Hans’ inviolate domain.
Here walks forlorn the Damsel crowned with rue,
Lactiferous spoils from vaccine dugs who drew,
Of that corniculate beast whose tortuous horn
Tossed to the clouds, in fierce, vindictive scorn,
The harrying hound whose braggart bark and stir
Arched the lithe spine and reared the indignant fur
Of Puss, that with verminicidal claw,
Struck the weird Rat, in whose insatiate maw
Lay reeking malt, that erst in Ivan’s courts we saw.
Robed in senescent garb that seems, in sooth,
Too long a prey to Chronos’ iron tooth;
Behold the man whose amorous lips incline,
Full with young Eros’ osculative sign,
To the lorn maiden, whose lac-albic hands
Drew albu-lactic wealth from lacteal glands
Of the immortal bovine, by whose horn
Distort, to realm ethereal was borne
The beast catulean, vexer of that sly
Ulysses quadrupedal, who made die
The old mordacious Rat, that dared devour
Antecedaneous ale, in John’s domestic bower.
Lo here, with hirsute honors doffed, succinct
Of saponaceous lock, the Priest, who linked
In Hymen’s golden bands the torn unthrift,
Whose means exiguous stared from many a rift,
Even as he kissed the virgin all forlorn,
Who milked the cow with implicated horn,
Who in fine wrath the canine torturer skied,
That dared to vex the insidious muricide,
Who let the auroral effluence through the pelt
Of the sly Rat that robbed the palace Jack had built.
The loud, cantankerous Shanghai comes at last,
Whose shouts aroused the shorn ecclesiast,
Who sealed the vows of Hymen’s sacrament,
To him who robed in garments indigent,
Exosculates the damsel lachrymose,
The emulgator of that hornèd brute morose,
That tossed the dog, that worried the cat, that kilt
The rat, that ate the malt, that lay in the house that Jack built.

The Home Journal having published a set of rather finical rules


for the conduct of equestrians in Central Park, a writer in Vanity Fair
supplemented and satirized them as follows:

ETIQUETTE OF EQUITATION.
When a gentleman is to accompany a lady on horseback,
1st. There must be two horses. (Pillions are out of fashion, except
in some parts of Wales, Australia and New Jersey.)
2d. One horse must have a side saddle. The gentleman will not
mount this horse. By bearing this rule in mind he will soon find no
difficulty in recognizing his own steed.
3d. The gentleman will assist the lady to mount and adjust her foot
in the stirrup. There being but one stirrup, he will learn upon which
side to assist the lady after very little practice.
4th. He will then mount himself. As there are two stirrups to his
saddle, he may mount on either side, but by no means on both; at
least, not at the same time. The former is generally considered the
most graceful method of mounting. If he has known Mr. Rarey he may
mount without the aid of stirrups. If not, he may try, but will probably
fail.
5th. The gentleman should always ride on the right side of the lady.
According to some authorities, the right side is the left. According to
others, the other is the right. If the gentleman is left handed, this will of
course make a difference. Should he be ambidexter, it will be
indifferent.
6th. If the gentleman and lady meet persons on the road, these will
probably be strangers, that is if they are not acquaintances. In either
case the gentleman and lady must govern themselves accordingly.
Perhaps the latter is the evidence of highest breeding.
7th. If they be going in different directions, they will not be
expected to ride in company, nor must these request those to turn and
join the others; and vice versa. This is indecorous, and indicates a
lack of savoir faire.
8th. If the gentleman’s horse throw him he must not expect him to
pick him up, nor the lady; but otherwise the lady may. This is important
to be borne in mind by both.
9th. On their return, the gentleman will dismount first and assist the
lady from her horse, but he must not expect the same courtesy in
return.
N. B.—These rules apply equally to every species of equitation, as
pony riding, donkey riding, rocking horse riding, or “riding on a rail.”
There will, of course, be modifications required, according to the form
and style of the animal.

SONG OF THE RECENT REBELLION.


AIR: “Lord Lovell.”

Lord Lovell he sat in St. Charles’ Hotel,


In St. Charles’ Hotel sat he;
As fine a case of a rebel swell,
As ever you’d wish to see, see, see,
As ever you’d wish to see!

Lord Lovell the town had sworn to defend,


A-waving his sword on high;
He swore that the last ounce of powder he’d spend,
And in the last ditch he would die, die, die,
And in the last ditch he would die.

He swore by black and he swore by blue,


He swore by the stars and bars,
That never he’d fly from a Yankee crew
While he was a son of Mars, Mars, Mars,
While he was a son of Mars.

He had fifty thousand gal-li-ant men,


Fifty thousand men had he,
Who had all sworn with him they would never surren-
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like