0% found this document useful (0 votes)
8 views6 pages

POLYCiNN Multiclass Binary Inference Engine Using Convolutional Decision Forests

The document presents POLYCiNN, a multiclass binary inference engine that utilizes convolutional decision forests to improve image classification efficiency while reducing computational complexity and power consumption. By implementing decision trees in a stack and using a Local Binary Pattern feature extraction layer, POLYCiNN achieves near state-of-the-art accuracy on benchmark datasets such as MNIST, CIFAR-10, and SVHN. The architecture is designed for efficient hardware implementation, particularly suitable for embedded and edge computing applications.

Uploaded by

reytrabajos1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views6 pages

POLYCiNN Multiclass Binary Inference Engine Using Convolutional Decision Forests

The document presents POLYCiNN, a multiclass binary inference engine that utilizes convolutional decision forests to improve image classification efficiency while reducing computational complexity and power consumption. By implementing decision trees in a stack and using a Local Binary Pattern feature extraction layer, POLYCiNN achieves near state-of-the-art accuracy on benchmark datasets such as MNIST, CIFAR-10, and SVHN. The architecture is designed for efficient hardware implementation, particularly suitable for embedded and edge computing applications.

Uploaded by

reytrabajos1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

POLYCiNN: Multiclass Binary Inference Engine

using Convolutional Decision Forests


Ahmed M. Abdelsalam1 , Ahmed Elsheikh2 , Jean-Pierre David3 and J. M. Pierre Langlois1
1
Department of Computer and Software Engineering
3
Department of Electrical Engineering
Polytechnique Montréal, Canada
2
Mathematical and Engineering Physics Department
Faculty of Engineering, Cairo University, Egypt
{ahmed.abdelsalam, jean-pierre.david, pierre.langlois}@polymtl.ca, [email protected]

Abstract—Convolutional Neural Networks (CNNs) have Accumulate (MAC) and memory access operations in both
achieved significant success in image classification. One of the training and inference [5]. Another drawback of CNNs is that
main reasons that CNNs achieve state-of-the-art accuracy is using they require careful selection of multiple hyper-parameters
many multi-scale learnable windowed feature detectors called
kernels. Fetching of kernel feature weights from memory and such as the number of convolutional layers, the number of
performing the associated multiply and accumulate computations kernels, the kernel size and the learning rate [1]. This results
consume massive amount of energy. This hinders the widespread in a large design space exploration that makes the training
usage of CNNs, especially in embedded devices. In comparison process of CNNs time consuming because of several inter-
with CNNs, decision forests are computationally efficient since fering parameters with many configurational combinations.
they are composed of decision trees, which are binary classifiers
by nature and can be implemented using AND-OR gates instead Current CNN applications are typically trained and run on
of costly multiply and accumulate units. In this paper, we clusters of computers with Graphical Processing Units (GPUs).
investigate the migration of CNNs to decision forests as one However, the limited throughput of mainstream processors and
of the promising approaches for reducing both execution time the high power consumption of GPUs limit their applicability
and power consumption while achieving acceptable accuracy. in embedded and edge computing CNN applications [6].
We introduce POLYCiNN, an architecture composed of a stack
of decision forests. Each decision forest classifies one of the Recently, there has been increased interest in other classi-
overlapped sub-images of the original image. Then, all decision fiers that should 1) suit the nature of hardware accelerators by
forest classifications are fused together to classify the input fully utilizing their specific computing resources to maximize
image. In POLYCiNN, each decision tree is implemented in a parallelism when executing a large number of operations [7]
single 6-input Look-Up Table and requires no memory access. [8]; 2) achieve acceptable classification accuracy [9] [10] [11];
Therefore, POLYCiNN can be efficiently mapped to simple and
densely parallel hardware designs. We validate the performance 3) be amenable to finding a robust model for a given task;
of POLYCiNN on the benchmark image classification tasks of and 4) be simple to train [12]. Decision Forests (DFs) were
the MNIST, CIFAR-10 and SVHN datasets. introduced as efficient models for classification problems [13].
Index Terms—Deep Learning, Decision Forests, Decision Trees, They operate by constructing a stack of Decision Trees (DTs)
Hardware Accelerators, FPGAs and then voting on the most popular output class. Since DTs
are binary in nature and can be implemented using AND-OR
I. I NTRODUCTION gates, DFs can be efficiently mapped to simple and densely
Convolutional Neural Networks (CNNs) have been over- parallel hardware architectures [14] [15]. Moreover, DFs can
whelmingly dominant in many computer vision problems, es- be trained quickly and are considered handy classifiers since
pecially image classification [1]. The recent success of CNNs they do not have many hyper-parameters [16]. However, by
is mainly due to the tremendous development of many deep contrast to CNNs, DFs do not achieve state-of-the-art accuracy
architectures such as AlexNet [2], GoogleNet [3] and ResNet on several applications [8]. CNNs outperform DFs in terms of
[1]. These deep CNN architectures are trained to extract repre- accuracy because they deploy several convolutional layers with
sentative features from their inputs through several non-linear many kernels to extract representative features from raw data.
convolutional layers. Typically, in each convolutional layer On the other hand, DFs divide the feature space into subspaces
many pre-trained windowed feature detectors called kernels based on simple comparison operations on the input data.
are applied on their inputs. One or more fully connected The motivation of this paper is based on three observa-
layers connect the top-level extracted features and make a tions. The first observation stems from the fact that CNNs
classification detection. achieve state-of-the-art accuracy by sliding many kernels over
Although CNNs achieve state-of-the-art accuracy in many images. This motivates us to propose convolutional DFs, where
tasks, they have deficiencies that limit their use in embedded DFs are applied over a sliding window (sub-images) on the
applications [1]. A main downside of CNNs is their computa- original image. The second observation is that most Field-
tional complexity. They typically demand many Multiply and Programmable Gate Arrays (FPGAs) fit any function with

978-1-7281-4074-2/19/$31.00 ©2019 IEEE


Authorized licensed use limited to: Universidad Peruana de Ciencias Aplicadas (UPC). Downloaded on August 30,2024 at 04:33:03 UTC from IEEE Xplore. Restrictions apply.
13
6-bit inputs in a single Look-Up Table (LUT). Therefore, still expensive in terms of computations and memory access
we limit the number of nodes of the DTs utilized in our since they required many layers and many kernels in each
convolutional DFs to six. Each DT can thus be optimally layer.
implemented in one LUT. This idea could be generalized to Another approach to simplify the computational and mem-
wider LUTs. The third observation is that DFs, by contrast ory complexity of CNNs is to implement them as DTs.
to CNNs, are not good feature extractors. We thus integrate Frosst et al. [20] proposed a method to distil knowledge from
a low complex feature extraction layer before the proposed trained neural networks to DTs. This method allows DTs to
convolutional DFs. This allows to achieve high performance generalize better than DTs learned directly from the training
with a significantly reduced amount of resources, and to data. Zhang et al. [21] roughly represented the rationale of
propose a corresponding architecture. each CNN prediction using a semantic DT structure. Both
Training DFs to learn both representative features of the methods achieve acceptable accuracy on the MNIST dataset
input data and the final classifiers in a joint manner is a diffi- but they lack performance on complex applications such as
cult problem [9]. This paper thus introduces POLYCiNN, an CIFAR-10 and SVHN. Abdelsalam et al. [7] [22] proposed
architecture composed of a stack of DFs. POLYCiNN follows POLYBiNN, a stack of DTs that replaces fully connected
the sliding kernels idea of CNNs and divides each input image layers of CNNs. Although they achieve near state-of-the-art
into several overlapped sub-images. Then, POLYCiNN trains accuracy on CIFAR-10, they require several convolutional
a different DF to classify each sub-image. A decision fusion layers to extract features from raw data.
algorithm is applied to combine all DF classifications. In The success of CNNs over DFs owes to layer-by-layer
order to achieve near state-of-the-art accuracy, we use a Local processing and in-model feature extraction [1]. Therefore,
Binary Pattern (LBP) layer that extracts representative features many works explored the possibility of building deep layered
of the inputs efficiently and with simple computations. We DFs to extract representative features that achieve near state-
demonstrate POLYCiNN’s capabilities on the MNIST, CIFAR- of-the-art accuracy. Zhou et al. [9] proposed deep DFs where
10 and SVHN datasets. the output vector of each DF is fed as the input to the next
The specific contributions of this paper are as follows: layer of DFs. Miller et al. [23] proposed forward thinking,
• We introduce POLYCiNN, an efficient classifier based on a general framework for training deep DFs layer by layer.
6-input LUT DTs. The authors demonstrated a proof of concept of their ideas
• We integrate a simple LBP feature extraction layer to on the MNIST dataset. However, the idea does not scale up
POLYCiNN and show that we can obtain near state- with complex applications such as CIFAR-10 and SVHN.
of-the-art accuracy with a reduced input set of binary In addition, implementing multi-layer DFs faces the same
parameters. complexity as implementing CNNs, especially when DFs have
• We explore different meta-parameters to optimize many DTs with many nodes.
POLYCiNN in terms of classification accuracy and re-
III. T HE POLYC I NN ARCHITECTURE
source utilization.
• We validate POLYCiNN on the MNIST, CIFAR-10 and In this section, we detail the POLYCiNN architecture, show
SVHN datasets and demonstrate the potential of using how we extract representative features using LBP, demonstrate
POLYCiNN for different applications. how we train POLYCiNN using simple sliding DFs, and show
The rest of the paper is organized as follows. Section II how it can be implemented in an FPGA.
provides a review of the related works. A detailed view of
A. Architecture overview
POLYCiNN, including its training, inference and its hardware
implementation, are described in Section III. Section IV is Fig. 1 shows an overview of the POLYCiNN architecture.
dedicated to the experimental results, comparison with other POLYCiNN starts by encoding input images using a LBP
architectures and discussion. Section V concludes the paper. descriptor, a simple yet powerful descriptor for image classi-
fication applications [24]. The features vectors alongside with
II. R ELATED WORKS the downsampled version of the input image are fed to a
Various approaches have been proposed to simplify the com- stack of POLYBiNNs [7]. Each POLYBiNN is a DF that is
putational and memory requirements of CNNs. One of these composed of a M × N array of DTs, where M is the number
approaches is to use single-bit data representation for inputs of classes and N is the number of trees per class, followed
and parameters [17]. Courbariaux et al. [18] proposed binary by a voting circuit. The voting circuit outputs of different
kernels and activations of convolutional layers. Since kernels POLYBiNNs are combined together using a decision fusion
and activations are binary, multiply and accumulate operations circuit to make the final classification.
of convolutional layers are then replaced by XNORs and Sliding windows and image pyramids play an integral role
counters. Several works [11] [19] exploited this paradigm and in image classification since they allow classifiers to localize
proposed its implementation on different hardware accelerators different objects at various scales [25]. We exploit that concept
such as FPGAs and Applications-Specific Integrated Circuits and divide the input images into w overlapped windows.
(ASICs). Although binary CNNs achieved competitive accura- Moreover, we downsample the original images and divide
cies for the MNIST, SVHN and CIFAR-10 datasets, they are them into the same number of windows. We train a stack

Authorized licensed use limited to: Universidad Peruana de Ciencias Aplicadas (UPC). Downloaded on August 30,2024 at 04:33:03 UTC from IEEE Xplore. Restrictions apply.
14
Fig. 1. Overview of the POLYCiNN architecture with w windows and M classes.

of POLYBiNNs, where each POLYBiNN classifies one image C. POLYCiNN training algorithm
window using the extracted LBP features vector of the original We train each POLYBiNN classifier on its corresponding
window and the corresponding window of the downsampled LBP feature vector and downsampled image window. POLY-
image (DI). Fig. 1 shows an example for the CIFAR-10 BiNNs are trained using AdaBoost, an ensemble learning
dataset, where w = 9, with 16×16 windows and a stride of algorithm that creates complex classifiers by combining many
eight pixels, and the downsampled image is 8×8 with nine weak DTs [22]. We limit the number of nodes of each DT
windows of size 6×6 with a stride of one pixel. to six in order to implement each DT as a single 6-input
LUT. Once all N DTs within the same POLYBiNN have been
B. Local Binary Pattern feature extraction trained, their outputs are combined to come to M decisions
(D1 to DM ) with M confidences (C1 to CM ). The output
The main goal of this layer is to obtain the most relevant (Dm ) is the binary decision of class (m) in the corresponding
information from inputs and represent that information in a POLYBiNN and can be 0 or 1. The output (Cm ) is a 2-bit
lower dimensionality space. We choose LBP descriptors [24] confidence value of the corresponding binary decision (Dm ).
because they measure the spatial structure of local image When the training process of all POLYBiNNs is completed,
texture efficiently and with parallel simple computations that we merge their outputs using a decision fusion approach
fit the nature of most hardware accelerators such as FPGAs. to obtain a decision for a given input. For each class m,
The LBP descriptor is formed by comparing the intensity the final confidence CFm is computed by summing all the
of the center pixel to its neighboring pixels within a patch. corresponding Cm together. We select the class with the
Neighbor pixels with higher intensity than the center pixel are highest confidence as the final classification decision. Fig. 3
assigned a value of 1, and 0 otherwise. LBP patch sizes are shows the overall process.
normally 3×3, 5×5, etc., however, we restrict the intensity
comparison of the center pixel to its four adjacent neighbor D. Implementing POLYCiNN in hardware
pixels (top, right, bottom and left) which reduces memory As shown in Fig. 1, POLYCiNN consists of a LBP fea-
access cost. This approach is more suitable for hardware ture extraction layer, a stack of POLYBiNNs where each
implementation since the comparisons are computed row-wise POLYBiNN is composed of an array of DTs followed by
and column-wise, as discussed in Section III.D. a voting circuit, and finally a decision fusion circuit that
Each pixel is now represented with a 4-bit string computed merges POLYBiNN classifications. We propose an efficient
by comparing the pixel intensity to its four corresponding hardware implementation of the LBP layer, as shown in Fig.
neighbors intensities. The final feature vector of each window 4. The architecture is composed of two arrays of comparators:
is the histogram of the feature values within the corresponding row comparators and column comparators. The row array of
window. The histogram provides better discrimination of the comparators compares between the intensity of each given
inputs and diminishes the dimensionality space of the inputs pixel and the intensity of its adjacent bottom neighbor pixel
to 16 (all possible values of a 4-bit string). Fig. 2 shows an in the consecutive row. The column array of comparators
example of computing the LBP features vector of a local image compares between the intensity of each given pixel and the
window. intensity of its adjacent right neighbor pixel in the consecutive

Authorized licensed use limited to: Universidad Peruana de Ciencias Aplicadas (UPC). Downloaded on August 30,2024 at 04:33:03 UTC from IEEE Xplore. Restrictions apply.
15
Fig. 2. Local binary pattern encoding process.

LBP feature vectors of different image windows are com-


puted then fed alongside with their corresponding downsam-
pled images the stack of POLYBiNNs, as shown in Fig. 1.
Each DT of different POLYBiNNs corresponds to a Sum of
Product (SOP) that is implemented in a single LUT since the
number of inputs is six, a constraint set during the training
process of POLYCiNN. Since DTs are binary classifiers by
nature, their inputs should be binary. Therefore, the extracted
feature vectors and the downsampled images, which serve as
DTs inputs, are binarized with thresholds that are learned dur-
ing the training process of POLYCiNN. A set of comparators
binarize the extracted feature vectors and downsampled images
according to the learned thresholds.
Fig. 3. Decision forests and decision fusion implementation of POLYCiNN. The decisions Dwm and confidences Cwm of each POLY-
BiNN of the corresponding image window are computed
without any arithmetic operations as detailed in [7]. The w
confidences of each class m are summed using a set of M
accumulators in the decision fusion circuit, as shown in Fig.
3. A set of pipelined comparators is used in the Argmax
block to select the class with the highest confidence. Since
the POLYCiNN architecture is well suited to parametrized
descriptions, we expanded and adapted the tool proposed in [7]
to generate a synthesizable HDL description of POLYCiNN
given a set of parameters.
Fig. 4. Local binary pattern hardware implementation in POLYCiNN. IV. E XPERIMENTAL RESULTS AND DISCUSSIONS
This section presents and analyzes the classification accu-
column. The output of each comparator is assigned a value of racy of POLYCiNN with different parameters, compares it to
1 or 0, as discussed in Section III.B. the literature and discusses its hardware cost.
In the proposed LBP layer, we compare the given pixel
intensity to its four adjacent neighbor pixels (top, right, bottom A. POLYCiNN Classification Performance
and left). However, in the proposed array of comparators we We tested POLYCiNN on the MNIST (28×28 handwritten
only compute the south and east comparisons. Therefore, the digits from 0 to 9), CIFAR-10 (32×32 color images in 10
natural and complemented versions of comparator outputs are categories) and SVHN (32×32 color image of street view
used to form the 4-bit string feature value of each color house numbers in 10 categories) datasets. We used the binary
channel of a given pixel. Once the 4-bit string of each version of MNIST version where each pixel is represented in
channel of all pixels is formed, a set of 16 comparators and one bit. In case of CIFAR-10 and SVHN, we only used the
accumulators construct the histogram of the computed feature 4-MSB of each pixel value in the three color channels. We
values and compute the feature vector of each window for a considered three different sets of experiments for classifying
given image. Fig. 4 shows the proposed implementation of the the datasets. The first set classified the three datasets when
LBP layer in hardware. training POLYCiNN using raw data. In the second set of

Authorized licensed use limited to: Universidad Peruana de Ciencias Aplicadas (UPC). Downloaded on August 30,2024 at 04:33:03 UTC from IEEE Xplore. Restrictions apply.
16
experiments, we trained POLYCiNN using the extracted LBP TABLE I
feature vectors. The third set trained POLYCiNN using the ACCURACY COMPARISON WITH EXISTING DECISION TREE APPROACHES
extracted LBP feature vectors alongside with the downsampled Accuracy (%)
image. MNIST CIFAR-10 SVHN
For reasons of comparison, we reproduced the results of [9] 99.10 - -
[20] 96.76 - -
training POLYBiNN [7] on the raw data of the three datasets. [8] 99.26 63.37 -
We also studied the effect of changing the number of DTs [7] 97.45 55.12 71.68
for each POLYBiNN on the accuracy. We trained POLYBiNN POLYCiNN* 98.23 63.43 76.35
* nine windows per image and 1000 DTs per class of each window.
for classifying the three datasets with 100, 500 and 1000 DTs
per class. In the case of POLYCiNN, we trained it with 100,
500 and 1000 DTs per class for each window. For CIFAR-10
or comparators. The delay caused in this part by awaiting
and SVHN, we divided each image into nine windows of size
neighbor pixels to be loaded is negligible. This is because the
16×16 and a stride of eight pixels. For MNIST, we divided
arrays of comparators access the memory symmetrically (row
each image into nine windows of size 22×22 and a stride of
by row and column by column). Moreover, this approach of
three pixels.
implementing LBP can be parallelized by using many arrays
Fig. 5 shows the accuracy for MNIST, CIFAR-10 and
of comparators, which increases the throughput at the cost of
SVHN as a function of the number of DTs in different
computational and memory access resources.
experiments. In the three datasets, POLYCiNN outperforms
Concerning the second part, all DTs have six decision nodes
POLYBiNN in terms of classification accuracy when both
as maximum. Consequently, each DT is implemented in a
are trained using raw data. This is because POLYCiNN has
single LUT. It should be noted that DTs with more nodes can
the advantage of using sliding DFs. The accuracy curves for
be utilized to increase the classification accuracy. However,
CIFAR-10 and SVHN indicate that using LBP features is a
this requires more LUTs to implement these DTs. The third
powerful approach that can achieve the same performance
part of the decision fusion circuits uses few accumulators
as using raw data. However, Fig. 5 shows that training
and a set of pipelined comparators that should not restrict
POLYCiNN using LBP features and downsampled images
the implementation capabilities. Future works will focus on
achieves higher classification accuracy for the three datasets.
experimenting different implementations of POLYCiNN in
This is because each POLYBiNN is trained using the LBP
FPGAs.
feature vector of its corresponding window alongside with its
neighbor area from the downsampled image. The accuracy of V. C ONCLUSION
POLYCiNN is sensitive to the number of DTs up to a limit
This paper presented POLYCiNN, a classifier inspired by
where the classification accuracy saturates, as shown in Fig. 5.
CNNs and Decision Forest (DF) classifiers. POLYCiNN com-
Table I compares the accuracy of POLYCiNN for the MNIST,
prises a stack of DFs, where each DF classifies one of the
CIFAR-10 and SVHN datasets with prior works. Although
overlapped image windows. POLYCiNN deploys an efficient
POLYCiNN is inferior to state-of-the-art CNNs [2] [3] [4] in
LBP feature extraction layer that improves its classification
terms or accuracy, we obtain better results when compared to
accuracy. We demonstrated that POLYCiNN achieves the same
other DF approaches [7] [8] [9].
accuracy as prior DF approaches on the MNIST, CIFAR-10
POLYCiNN with LBP feature extraction layer and down-
and SVHN datasets. From a hardware perspective, POLYCiNN
sampled image suits the CIFAR-10 and SVHN datasets more
can be implemented using efficient computational and memory
than the MNIST dataset, as shown in Fig. 5. This is because
resources. Moreover, it can be configured to suit various
the variability in the CIFAR-10 and SVHN datasets in terms of
hardware accelerators and embedded devices.
image translations, scales, rotations, color spaces and geomet-
rical deformations are much more than those in the MNIST VI. ACKNOWLEDGMENTS
dataset. When a space of high variability is divided into sub-
spaces using DFs, the variations become less evident and the The authors would like to thank Imad Benacer and Siva
overall performance of classifier fusion on all sub-spaces gains Chidambaram for their insightful comments.
are notable. On the other hand, when there are few variations, R EFERENCES
the gains of space division are less notable since there is not
much reduction in variability. [1] Y. LeCun, Y. Bengio and G. Hinton, ”Deep learning.” Nature, May 2015.
[2] A. Krizhevsky, I. Sutskever and G. Hinton, ”ImageNet Classification
with Deep Convolutional Neural Networks.” Advances in Neural Infor-
B. Hardware implementation mation Processing Systems, 2012.
The hardware implementation of POLYCiNN is composed [3] C. Szegedy et al., ”Going Deeper with Convolutions.” IEEE Conference
on Computer Vision and Pattern Recognition, 2015.
of three main parts 1) extracting LBP features, 2) decision [4] S. Han, H. Mao and W. J. Dally, ”Deep Compression: Compressing
forests of decision trees and 3) decision fusion circuit. As Deep Neural Networks with Pruning, Trained Quantization and Huffman
discussed in Section III.D, the simple computations needed to Coding.” arXiv preprint :1510.00149, Oct. 2015 Oct
[5] D. Hunter, H. Yu, M. S. Pukish, J. Kolbusz and B. M. Wilamowski,
compute LBP feature vectors should not hinder the implemen- ”Selection of Proper Neural Network Sizes and Architectures—A Com-
tation process since these features are computed using arrays parative Study.” IEEE Trans. on Industrial Informatics, May 2012.

Authorized licensed use limited to: Universidad Peruana de Ciencias Aplicadas (UPC). Downloaded on August 30,2024 at 04:33:03 UTC from IEEE Xplore. Restrictions apply.
17
Fig. 5. POLYCiNN accuracy for the CIFAR-10, SVHN and MNIST datasets.

[6] E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Hock, Y. T. [16] T. Hastie, R. Tibshirani, and J. H. Friedman, ”The Elements of Statistical
Liew, K. Srivatsan, D. Moss, S. Subhaschandra and G. Boudoukh, ”Can Learning.” Springer Series in Statistics, 2009.
FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Net- [17] Y. Cheng, D. Wang, P. Zhou and T. Zhang, ”A Survey of Model
works?” ACM/SIGDA International Symposium on Field-Programmable Compression and Acceleration for Deep Neural Networks.” arXiv
Gate Arrays, Feb. 2017. preprint:1710.09282, Oct. 2017.
[7] A. M. Abdelsalam, A. Elsheikh A, J. P. David and J. M. P. Langlois, [18] M. Courbariaux, I. Hubara, D. Soudry, R.E.Yaniv and Y. Bengio,
”POLYBiNN: A Scalable and Efficient Combinatorial Inference Engine ”Binarized Neural Networks: Training Deep Neural Networks with
for Neural Networks on FPGA.” IEEE Conference on Design and Weights and Activations Constrained to +1 or -1.” Computer Research
Architectures for Signal and Image Processing, Oct. 2018. Repository, arXiv preprint:1602.02830, Feb. 2016.
[8] Z. H. Zhou and J. Feng, ”Deep forest: Towards an Alternative to Deep [19] E. Wang, J. J. Davis, R. Zhao, H. C. Ng, X. Niu, W. Luk, P. Y. Cheung
Neural Networks.” arXiv preprint:1702.08835. Feb. 2017. and G. A. Constantinides, ”Deep Neural Network Approximation for
[9] P. Kontschieder, M. Fiterau, A. Criminisi and S. B. Rota, ”Deep Neural Custom Hardware: Where We’ve Been, Where We’re Going.” arXiv
Decision Forests.” IEEE International Conference on Computer Vision, preprint:1901.06955. Jan. 2019.
2015. [20] N. Frosst and G. Hinton, ”Distilling a Neural Network into a Soft
[10] P. Gysel, M. Motamedi and S. Ghiasi, ”Hardware-Oriented Approxi- Decision Tree.” arXiv preprint:1711.09784, Nov. 2017.
mation of Convolutional Neural Networks.” arXiv preprint:1604.03168, [21] Q. Zhang, Y. Yang, Y. N. Wu and S. C. Zhu, ”Interpreting CNNs via
Apr. 2016. Decision Trees.” arXiv preprint:1802.00121, Feb. 2018.
[11] K. Abdelouahab, M. Pelcat, J. Serot and F. Berry, ”Accelerating CNN [22] A. M. Abdelsalam, A. Elsheikh, S. Chidambaram, J. P. David and J. M.
Inference on FPGAs: A Survey.” arXiv preprint:1806.01683, May 2018. P. Langlois, ”POLYBiNN: Binary Inference Engine for Neural Networks
[12] C. Hettinger, T. Christensen, B. Ehlert, J. Humpherys, T. Jarvis and S. using Decision Trees.” Journal of Signal Processing Systems. May 2019.
Wade, ”Forward Thinking: Building and Training Neural Networks One [23] K. Miller, C. Hettinger, J. Humpherys, T. Jarvis and D. Kartch-
Layer at a Time.” arXiv preprint:1706.02480, Jun. 2017. ner, ”Forward Thinking: Building Deep Random Forests.” arXiv
[13] L. Breiman, ”Random forests.” Machine Learning, Oct. 2001. preprint:1705.07366, May 2017.
[14] S. B. Akers, ”Binary Decision Diagrams.” IEEE Transactions on Com- [24] T. Ojala, M. Pietikainen and T. Maenpaa, ”Multiresolution Gray-Scale
puters, Jun. 1978. and Rotation Invariant Texture Classification with Local Binary Pat-
[15] P.T. Tang, ”Table-Lookup Algorithms for Elementary Functions and terns.” IEEE Transactions on Pattern Analysis Machine Intelligence,
Their Error Analysis.” IEEE Symposium on Computer Arithmetic, Jun. pp. 971-987, Jul. 2002.
1991. [25] R. C. Gonzalez and R. E. Woods, Digital image processing. Publishing
House of Electronics Industry, Mar. 2002.

Authorized licensed use limited to: Universidad Peruana de Ciencias Aplicadas (UPC). Downloaded on August 30,2024 at 04:33:03 UTC from IEEE Xplore. Restrictions apply.
18

You might also like