0% found this document useful (0 votes)
45 views5 pages

Reconfigurable Hardware Design Approach For Economic Neural Network

Uploaded by

1604002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

Reconfigurable Hardware Design Approach For Economic Neural Network

Uploaded by

1604002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

5094 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 69, NO.

12, DECEMBER 2022

Reconfigurable Hardware Design Approach


for Economic Neural Network
Kasem Khalil , Member, IEEE, Ashok Kumar , Senior Member, IEEE, and Magdy Bayoumi, Life Fellow, IEEE

Abstract—Hardware-based neural networks are becoming design of ANN with fewer components and area reduction to
attractive because of their superior performance. One of the be used on-chip.
research challenges is to design such hardware using less area to Nguyen et al. [13] present hardware implementations of
minimize the cost of on-chip implementation. This brief proposes ANN with stochastic computing and conventional binary radix
an area-efficient implementation of an Artificial Neural Network to reduce the area cost. Their method is tested using the
(ANN). The proposed method reduces the number of layers in
the ANN by nearly half through a novel, dual use of some layers
MNIST dataset and reported to attain 92.18% accuracy, which
denoted as hidden layers. The hidden layers are non-traditional is on par with the traditional method (or baseline method).
layers proposed in this brief. They are adaptable, and each such Their method is implemented on FPGA, and it significantly
layer performs two separate functions through judicious use of reduces LUTs. The estimated overall area reduction is 30%
different weights. Thus, each hidden or flexible layer does the compared to the baseline network (straight feedforward neural
work of two traditional ANN layers. The other type of layers network). Zyarah et al. [14] present two techniques for hard-
used in the proposed design is the fixed layers that are used ware implementation of a feedforward neural network based
traditionally. The fixed layers are not flexible and serve the func- on reusing resources. These techniques are coalescing and
tionality of a single layer. The proposed design keeps the number folding. One single stack of nodes is used for both feature
of the fixed layers as low as possible. One or more fixed layers extraction and classification using a shared path. The second
may still be needed for some applications besides the proposed
approach is a folding technique, in which nodes are folded for
flexible layers. The proposed method is implemented in Verilog
HDL on Altera Arria 10 GX FPGA. Its area usage is only 41% executing several tasks within a high-dimension network-layer.
of the state-of-the-art method, while its power consumption and The methods are tested using the MNIST dataset for a classi-
speed overheads are small. fication task on binary. The methods result in 91.2% accuracy
and provide a 13.17% reduction in the area compared with
Index Terms—Artificial neural network, hardware neu- the baseline network (straight feedforward neural network).
ral network, image recognition, pattern recognition, FPGA
implementation.
Rasul et al. [15] present a method for minimizing the physi-
cal number of interconnects between the network nodes. It is
based on using time-division multiplexing for the communica-
tion between the nodes. A single channel is used to minimize
I. I NTRODUCTION the interconnects where the time-division multiplexing is used
NN IS increasingly being used in many applications
A such as clustering, classification [1], climate forecast-
ing, disease diagnosis [2], prediction, pattern recognition [3],
through the channel. The main drawbacks of this method are
the scalability for complex networks and the computation com-
plexity, which increases the power consumption of each node.
image [4] and speech recognition [5], natural language A hardware reconfigurable ANN approach is presented in [16].
processing, and decision making. ANNs typically provide It method provides the flexibility to change the node organi-
advantages in terms of scalability, processing speed, and zation in hardware to be suitable for many applications. The
accuracy [6]–[10]. The most attractive benefit of ANN is the reconfigurable designed nodes have the ability to switch from
high-speed processing, which can be realized through paral- one layer to another to speed up the network or to get a certain
lel implementation. The current challenge of designing ANN desired performance.
is to implement it on hardware using a low cost in terms of A neural network uses many layers for complex appli-
area and power [11], [12]. This brief focuses on the hardware cations, which in turn requires a large area of silicon for
hardware implementation [11], [17], [18]. An ANN includes
Manuscript received 16 June 2022; accepted 29 June 2022. Date of pub- a collection of nodes, and a connection is used between nodes
lication 15 July 2022; date of current version 2 December 2022. This brief for transmitting a signal, as shown in Fig. 1. Each node’s
was recommended by Associate Editor L. A. Camunas-Mesa. (Corresponding
author: Kasem Khalil.)
input is multiplied by a weight, and the result applies to an
Kasem Khalil is with the Electrical and Computer Engineering Department, adder accumulator. An activation function receives the result-
University of Mississippi, Oxford, MS 38677 USA, and also with the ing data from the adder accumulator to provide the final
Electrical Engineering Department, Assiut University, Asyut 71515, Egypt result [19], [20]. The final output is given by equation (1).
(e-mail: [email protected]; [email protected]).
Ashok Kumar is with the Center for Advanced Computer Studies,
⎛ ⎞
University of Louisiana at Lafayette, Lafayette, LA 70504 USA (e-mail: n
[email protected]). yi = f ⎝ Wji ∗ Xj + bi ⎠ (1)
Magdy Bayoumi is with the Department of Electrical and Computer j=0
Engineering, University of Louisiana at Lafayette, Lafayette, LA 70504 USA
(e-mail: [email protected]).
Color versions of one or more figures in this article are available at
In equation (1), Xj is the jth neuron output in the preceding
https://doi.org/10.1109/TCSII.2022.3191342. layer, Wji is the synaptic weight from jth neuron to the ith
Digital Object Identifier 10.1109/TCSII.2022.3191342 neuron in the current layer, n is the number of neurons, f is
1549-7747 
c 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: North South University. Downloaded on February 23,2024 at 17:47:58 UTC from IEEE Xplore. Restrictions apply.
KHALIL et al.: RECONFIGURABLE HARDWARE DESIGN APPROACH FOR ECONOMIC NEURAL NETWORK 5095

Fig. 1. Neural network architecture.


Fig. 3. The traditional neural network node architecture.

Fig. 2. Block diagram of the proposed method.

the activation function, and b is the bias. This brief’s contribu- Fig. 4. The proposed neural network node architecture.
tion is to propose the hardware structure of a neural network
with fewer layers than the state-of-the-art method, resulting in layer twice. Each node has two weights instead of one weight
a 41% area reduction. The proposed method also has the flex- to accomplish such reuse. These weights are obtained through
ibility to configure and update the number of layers on-chip neural network learning, and each weight is updated during
design. The remainder of this brief is organized as follows. its round. In this way, each proposed layer works as two suc-
Section II presents the proposed ANN architecture. Section III cessive layers, and each one has its weight which it attains by
presents the implementation and experimental results of the learning. The traditional node structure consists of Multiplier
proposed method, followed by the conclusion in Section IV. Adder Accumulator (MAC), memory to store the weight value,
an activation function, as shown in Fig. 3. In the proposed
II. T HE P ROPOSED M ETHOD method, the node structure consists of MAC, memory to store
The hardware implementation of ANN is complex, and two weights values (W0 and W1 ), configuration block, and
this brief proposes an area-economic hardware design. The the activation function, as shown in Fig. 4. The configuration
proposed method divides the hidden layers into two parts: block is used to select the corresponding weight of the cur-
fixed and adaptable. A fixed part consists of m layers that are rent round. The proposed method minimizes the number of
the same as the traditional network (i.e., the baseline method, adaptable hardware layers to half, and the final number of the
which is a regular feedforward neural network). A fixed layer layers is given by equation (4).
is used only if the required number of layers is odd. h−m h+m
fl = m + = (4)
|L| = 2k + m (2) 2 2
N = k+m (3) In equation (4), h is the total number of the equivalent hid-
den layers. The operation of the proposed method is described
In equation (2), |L| is the number of equivalent software layers as follows. The network receives the inputs via the input layer.
(or the number of baseline hardware layers), k is the number The computation occurs through m fixed hidden layers. The
of adaptable hardware layers, m is the number of fixed hard- result of the last fixed layer is applied to the first adaptable
ware layers, and m is odd. N is the number of total hardware layer, as shown in Fig. 5. Two layers of multiplexers are used
layers in the proposed design. If |L| is even, no fixed layer is in the adaptable section. One is used before the first adapt-
needed (i.e., |L| = 2k, and m = 0), and each layer doubles able layer, and the other one is used after the last layer. These
up in functionality. The adaptable part is used to minimize multiplexers are used to select memory weight W0 in the first
the number of the adaptable layers to the half. The proposed round and select memory weight W1 during the second round.
method is shown in Fig. 2. The adaptable layers are used in This selection is made by selection signals: S0 and S1 for the
the first round using Weight W0 . The output of the last layer first and last layers, respectively. The S0 is used to forward
returns back to the first layer of the adaptable layers to con- either the output of the last fixed layer or the result of the
tinue the operation as extended layers using a different Weight last hidden layer to the first adaptable layer. The S1 is used
W1 . The idea of using adaptable layers is based on using each to either isolate the output layer using ‘z’ state or forward the

Authorized licensed use limited to: North South University. Downloaded on February 23,2024 at 17:47:58 UTC from IEEE Xplore. Restrictions apply.
5096 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 69, NO. 12, DECEMBER 2022

Fig. 5. The proposed neural network architecture.

result of the last hidden layer to the output layer. During the with six layers works as a network with 12 layers (L1 → L2
first round, S0 = ‘0’ to forward the result of the last fixed layer → L3 → L4 → L5 → L6 → L1 → L2 → L3 → L4 → L5
to the first adaptable layer, and the rest layers perform their → L6). Thus, the proposed method pares down the number
operation using weights of W0 . By the end of the last layer of layers to half. The implementation and experimental details
operation, the selection signal S1 = ‘0’ does not forward any are described next.
data to the final output layer. The result of the last hidden
layer returns back to the multiplexer input of the first adapt-
able layer. At this time, the selection signal S0 switches from III. I MPLEMENTATION AND E XPERIMENTAL R ESULTS
‘0’ to ‘1’, and the multiplexer forwards the result of the last The proposed method is implemented using three datasets:
layer to the input of the first adaptable layer. The adaptable MNIST, CIFAR-10, and USPS datasets. The MNIST dataset
layers work as an extended layer by using W1 for computa- contains handwritten digits images (0-9). It includes 10,000
tion. Every layer performs its operation until the last layer. images in the test set phase and 60,000 images in the train-
The selection signal S1 switches from ‘0’ to ‘1’ to forward ing set phase [21]. The CIFAR dataset consists of 8 million
the last layer result towards the output layer to get the final tiny images with labeled subsets. It has 60000 32x32 color
result. Therefore, the proposed method reduces the hardware images in 10 classes, with 6000 images per class [22]. In this
cost of ANN by using a layer as two layers. dataset, 50K samples are used for training, and 10K samples
Assume a network with six layers, and two of them (L1 and are used for testing. The USPS dataset is a postal library of
L2 ) are used as fixed layers and the other four layers (L3 , L4 , American postal services, and it includes 9000 samples for
L5 , and L6 ) are used as adaptable layers. The network runs as recognition [23]. The proposed method is implemented with
the following. The input layer forwards its data to L1 , and the a different number of layers and compared with the base-
layer L1 forwards its result to the next layer L2 . The S0 value line method in terms of accuracy. For the four hidden layers
is ‘0’ to forward the result of L2 to L3 . The L3 performs its network, it has a number of nodes as 1024, 512, 128, and 10.
computation using first weight W0 , and the next layers perform For the six hidden layers network, it has a number of nodes
their computation in the same way unit the last layer L6 . The as 1024, 512, 256, 128, and 10. For the eight hidden layers
signal of multiplexer S1 has a value of ‘0’, thus, the output network, it has a number of nodes as 1024, 512, 256, 128,
layer is isolated and the output of L6 forwards back to the 64, and 10. The proposed method is designed using two fixed
multiplexer of L3 . The S0 switches to ‘1’, and the multiplexer layers (m = 2) with four, six, and eight layers as shown in
sends the output of L5 to the input of L3 . The layer L3 works, Table I. The results show the proposed method has higher
now, as the 7th layer with a weight of W1 , then the layer accuracy than the baseline method with the same number of
L4 works like the 8th layer, and the layer L5 runs as the 9th layers. This is due to the adaptable layers being used twice.
layer. The last layer L6 runs as the 10th layer. The S1 switch For example, using four layers, the proposed method runs as
form ‘0’ to ‘1’ to forward the result of L6 to the output layer. a six layers network because two adaptable layers work as
Therefore, the sequence of the network operation is L1 → L2 four layers while the baseline method has only four layers.
→ L3 → L4 → L5 → L6 → L3 → L4 → L5 → L6. The Therefore, it has more accuracy than the baseline method, as
proposed method provides an operation of ten layers using six shown in Table I. The implementation is repeated for a dif-
layers (N = 6 and m = 2), the proposed method saves four ferent number of layers to show its consistency and efficiency
layers((m + N)/2 = 4). compared to the baseline method (traditional method). It is
The proposed method can be used to reduce the total number also tested for the three datasets for validation, as shown in
of the hidden layer to half (N/2). It uses hidden layers as Table I.
adaptable layers (m = 0 indicates there are no fixed layers, The proposed method with m = 0 is studied using the
thus indicating the presence of an even number of layers). three datasets. In this case, the network works effectively with
The multiplexers of S0 are used before the first hidden layer, double the number of layers because each layer works as two.
and the second multiplexers are used after the last layer, as For example, a network with four layers works as a network
shown using the dashed line in Fig. 2. For example, a network of eight layers. The proposed method is compared with the
Authorized licensed use limited to: North South University. Downloaded on February 23,2024 at 17:47:58 UTC from IEEE Xplore. Restrictions apply.
KHALIL et al.: RECONFIGURABLE HARDWARE DESIGN APPROACH FOR ECONOMIC NEURAL NETWORK 5097

TABLE I
C OMPARISON B ETWEEN THE P ROPOSED M ETHOD AND THE BASELINE M ETHOD W ITH m = 2

TABLE II
C OMPARISON B ETWEEN THE P ROPOSED M ETHOD AND THE BASELINE M ETHOD W ITH m = 0

TABLE III
C OMPARISON OF R ESOURCES U TILIZATION ON THE H ARDWARE I MPLEMENTATION

baseline method, as shown in Table II. The results show the


proposed method is more efficient. For example, for six lay-
ers of networks and MNIST dataset, the proposed method
has an accuracy of 97.1%, and the baseline method has an
accuracy of 94.9%. Thus, the proposed method contributes to
performance with a low number of layers. Both methods have
the same number of layers, but the proposed method acts as
doubled layers compared to the traditional method and gives
higher accuracy. This is because more layers result in higher
loss in accuracy. The proposed method for both m = 0 and
m = 0 are studied with a different number of hidden layers,
as shown in Fig. 6. In both cases, the proposed method pro-
vides efficient performance compared to the baseline method
(i.e., the traditional method). For example, using m = 2, the
proposed method has an accuracy of 97.2%, and the conven-
tional method has an accuracy of 96.4% for ten hidden layers
in a network. The proposed method also has an accuracy of
98% while the conventional method has an accuracy of 96.4%
using m = 0 for hidden layers network. Fig. 6. The accuracy result for different layers.
The proposed method is implemented using Verilog HDL
on Altera Arria 10 GX FPGA at a frequency of 120 MHz. The as the number of rounds increases. The number of rounds
proposed method is implemented using a datapath width of 32 (iterations) depends on the application.
and 64 bits. The proposed method is economic in hardware The proposed method has a small overhead of time compu-
because it uses less number of hidden layers. The number tation, which is given by equation (5). The hardware resources
of layers is reduced to (m + N)/2 or N/2 when m = 0. consumption for m = 0 and N = 4 layers, where each layer
The computation time of the proposed method is derived includes 100 neurons for both proposed and traditional meth-
experimentally, and it is given by equation (5). ods, is shown in Table III. The results are given in terms of
registers, LUTs, DSPs, Buffers, block RAM, and Flip Flop
(0.1(nr − 1))
tcomputation = TN + TN (5) (FF). These results present the used resources and their utiliza-
N tion, which is the ratio of used resources to the total available
In equation (5), tcomputation is the overall network time com- resources. The proposed method is also compared with the
putation, TN is the time computation of N layers, nr is the baseline method for m = 2 and N = 4 layers as shown in
number of rounds, and N is the number of layers. The time Table III, and each layer includes 100 neurons. The resource
computation equals TN when the nr = 1 which is the case utilization results show the proposed method has resource uti-
of the normal neural network. Thus, the time computation lization of 2278, 22029, 7, 12, 9, 199, and 407 for the number
overhead is (0.1(nr − 1)/N)TN . For two rounds as described of registers, LUTs, DSPs, Buffers, block RAM, and FF, respec-
previously, the overhead is (0.1/N)TN . The overhead increases tively. Meanwhile, the baseline method has resource utilization
Authorized licensed use limited to: North South University. Downloaded on February 23,2024 at 17:47:58 UTC from IEEE Xplore. Restrictions apply.
5098 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 69, NO. 12, DECEMBER 2022

TABLE IV
H ARDWARE I MPLEMENTATION R ESULTS R EFERENCES
[1] A. Alghoul, S. Al Ajrami, G. Al Jarousha, G. Harb, and S. S. Abu-Naser,
“Email classification using artificial neural network,” Int. J. Acad.
Develop., vol. 2, no. 11, pp. 8–14, 2018.
[2] A. S. Musallam, A. S. Sherif, and M. K. Hussein, “A new convo-
lutional neural network architecture for automatic detection of brain
tumors in magnetic resonance imaging images,” IEEE Access, vol. 10,
pp. 2775–2782, 2022.
[3] D. M. D’Addona, A. S. Ullah, and D. Matarazzo, “Tool-wear prediction
TABLE V and pattern-recognition using artificial neural network and DNA-based
C OMPARISON W ITH P REVIOUS W ORK computing,” J. Intell. Manuf., vol. 28, no. 6, pp. 1285–1301, 2017.
[4] K. Khalil, O. Eldash, A. Kumar, and M. Bayoumi, “Designing novel
AAD pooling in hardware for a convolutional neural network accelera-
tor,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 30, no. 3,
pp. 303–314, Mar. 2022.
[5] C.-W. Huang and S. S. Narayanan, “Deep convolutional recurrent neural
network with attention mechanism for robust speech emotion recog-
nition,” in Proc. IEEE Int. Conf. Multimedia Expo (ICME), 2017,
pp. 583–588.
[6] R. Shi et al., “FTDL: A tailored FPGA-overlay for deep learning with
high scalability,” in Proc. 57th ACM/IEEE Design Autom. Conf. (DAC),
2020, pp. 1–6.
of 2817, 2292, 8, 12, 10, 292, and 597 for the number of reg- [7] M. Wang, Y. Wang, C. Liu, and L. Zhang, “Network-on-interposer
isters, LUTs, DSPs, Buffers, block RAM, and FF, respectively. design for agile neural-network processor chip customization,” in Proc.
58th ACM/IEEE Design Autom. Conf. (DAC), 2021, pp. 49–54.
The proposed method is synthesized using Synopsys Design [8] A. Mozaffari, M. Emami, and A. Fathi, “A comprehensive investi-
Compiler in 45-nm technology and found to have an area gation into the performance, robustness, scalability and convergence
reduction of 41% compared to the traditional method. It has a of chaos-enhanced evolutionary algorithms with boundary constraints,”
power consumption of 44 mW (The internal power consump- Artif. Intell. Rev., vol. 52, no. 4, pp. 2319–2380, 2019.
[9] O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed,
tion is 31.68 mW (72%), and the switching power is 12.32 mW and H. Arshad, “State-of-the-art in artificial neural network applications:
(28%)), while the traditional method has 42 mW power con- A survey,” Heliyon, vol. 4, no. 11, 2018, Art. no. e00938.
sumption, as shown in Table IV. The small overhead is due to [10] M. A. Hanif and M. Shafique, “DNN-Life: An energy-efficient aging
mitigation framework for improving the lifetime of on-chip weight mem-
the multiplexers computation. The time usage is studied with ories in deep neural network hardware architectures,” in Proc. Design
TN = 4, nr = 2, and the result shows the baseline method has Autom. Test Europe Conf. Exhibit. (DATE), 2021, pp. 729–734.
a time usage of TN = 53 Sec, and the proposed method has a [11] N. Zheng and P. Mazumder, “Artificial neural networks in hardware,” in
Learning in Energy-Efficient Neuromorphic Computing: Algorithm and
complexity of tcomputation = 54.3 Sec. As shown in Table V, Architecture Co-Design. Hoboken, NJ, USA: Wiley, 2020, pp. 61–118,
the proposed method is efficient with lower hardware cost and doi: 10.1002/9781119507369.ch3.
higher accuracy compared with previously known work in the [12] T. T. T. Bui and B. Phillips, “A scalable network-on-chip based neural
network implementation on FPGAs,” in Proc. IEEE-RIVF Int. Conf.
literature. For example, the accuracy of classification, using the Comput. Commun. Technol. (RIVF), 2019, pp. 1–6.
MNIST dataset, is 92.18% with a 30% area reduction for [13]. [13] D.-A. Nguyen, H.-H. Ho, D.-H. Bui, and X.-T. Tran, “An efficient hard-
The accuracy is 91.2% with a 13.17% area reduction for [14]. ware implementation of artificial neural network based on stochastic
computing,” in Proc. 5th NAFOSTED Conf. Inf. Comput. Sci. (NICS),
The accuracies are 93% and 94.2% for [12], and [24], respec- 2018, pp. 237–242.
tively. Meanwhile, the proposed method has an accuracy of [14] A. M. Zyarah and D. Kudithipudi, “Resource sharing in feed forward
96.8%. The overall results show that the proposed method is neural networks for energy efficiency,” in Proc. IEEE 60th Int. Midwest
Symp. Circuits Syst. (MWSCAS), 2017, pp. 543–546.
very efficient and economical, and its cost is small compared [15] R. A. Rasul, P. Teimouri, and M. S.-W. Chen, “A time multiplexed
to its benefits. network architecture for large-scale neuromorphic computing,” in
Proc. IEEE 60th Int. Midwest Symp. Circuits Syst. (MWSCAS), 2017,
pp. 1216–1219.
[16] K. Khalil, O. Eldash, B. Dey, A. Kumar, and M. Bayoumi, “A novel
reconfigurable hardware architecture of neural network,” in Proc. IEEE
IV. C ONCLUSION 62nd Int. Midwest Symp. Circuits Syst. (MWSCAS), 2019, pp. 618–621.
This brief proposed a method for low-cost hardware imple- [17] K. Khalil, O. Eldash, A. Kumar, and M. Bayoumi, “An efficient approach
for neural network architecture,” in Proc. 25th IEEE Int. Conf. Electron.
mentation of ANN. The proposed method utilized each hidden Circuits Syst. (ICECS), 2018, pp. 745–748.
layer twice, with one operation each for two different layers. [18] S. Moosazadeh, E. Namazi, H. Aghababaei, A. Marto, H. Mohamad, and
Each node has two separate memory. For the first operation, M. Hajihassani, “Prediction of building damage induced by tunnelling
through an optimized artificial neural network,” Eng. Comput., vol. 35,
the weight of W1 was used, and for the second operation, the no. 2, pp. 579–591, 2019.
weight of W2 was used. The method was tested using multiple [19] Z. Zhang, K. Zhang, and Khelifi, Multivariate Time Series Analysis
datasets and multiple hidden layers. It was implemented using in Climate and Environmental Research. Cham, Switzerland: Springer,
2018.
Verilog HDL and Altera Arria 10 GX FPGA, and it showed a [20] X. Lin et al., “All-optical machine learning using diffractive deep neural
41% reduction in area reduction compared to the state-of-the- networks,” Science, vol. 361, no. 6406, pp. 1004–1008, 2018.
art method. The proposed method can be used to reduce the [21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learn-
ing applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
number of layers to N/2, and the method is also scalable. It pp. 2278–2324, Nov. 1998.
has a power consumption of 44 mW, which is 9% more than [22] A. Krizhevsky and G. Hinton “Learning multiple layers of features from
the state-of-the-art methods, but the advantages in terms of tiny images,” Dept. Comput Sci., Univ. Toronto, Toronto, ON, Canada,
area reduction and accuracy are significant. The results showed Rep. TR-2009, 2009.
[23] J. J. Hull, “A database for handwritten text recognition research,”
the proposed method resulted in the reduction of hardware IEEE Trans. Pattern Anal. Mach. Intell., vol. 16, no. 5, pp. 550–554,
components while still preserving performance efficiency. The May 1994.
proposed method can be easily adapted for use as a stage of [24] P. Colangelo, O. Segal, A. Speicher, and M. Margala, “Artificial neu-
ral network and accelerator co-design using evolutionary algorithms,”
a convolutional neural network for complex problems such as in Proc. IEEE High Perform. Extreme Comput. Conf. (HPEC), 2019,
speech recognition and complex image classification. pp. 1–8.

Authorized licensed use limited to: North South University. Downloaded on February 23,2024 at 17:47:58 UTC from IEEE Xplore. Restrictions apply.

You might also like