0% found this document useful (0 votes)

21 views10 pages

Design and Implementation of Deep Neural Network Hardware Chip and Its Performance Analysis

The artificial neural network (ANN) with a single layer has a limited capacity to process data. Multiple neurons are connected in the human brain, and the actual capacity of the brain lies in the interconnectedness of multiple neurons. As a specified generalization of ANN deep learning makes use of two or more hidden layers, which implies that a greater number of neurons are required to construct the model. A network that has more than one hidden layer, also known as two or more hidden layers, i

Uploaded by

IAES International Journal of Robotics and Automation

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views10 pages

Design and Implementation of Deep Neural Network Hardware Chip and Its Performance Analysis

Uploaded by

IAES International Journal of Robotics and Automation

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IAES International Journal of Robotics and Automation (IJRA)

Vol. 13, No. 4, December 2024, pp. 485~494

ISSN: 2722-2586, DOI: 10.11591/ijra.v13i4.pp485-494  485

Design and implementation of deep neural network hardware

chip and its performance analysis

Aruna Pant, Adesh Kumar

Department of Electrical and Electronics Engineering, School of Advanced Engineering, UPES, Dehradun, India

Article Info ABSTRACT

Article history: The artificial neural network (ANN) with a single layer has a limited
capacity to process data. Multiple neurons are connected in the human brain,
Received Mar 13, 2024 and the actual capacity of the brain lies in the interconnectedness of multiple
Revised Jun 9, 2024 neurons. As a specified generalization of ANN deep learning makes use of
Accepted Jul 17, 2024 two or more hidden layers, which implies that a greater number of neurons
are required to construct the model. A network that has more than one
hidden layer, also known as two or more hidden layers, is referred to as a
Keywords: deep neural network, and the process of training such networks is referred to
as deep learning. The research article focuses on the design of a multilayer
Control logic or deep neural network presented for the target field programmable gate
Deep learning neural network array (FPGA) device spartan-6 (xc6stx4-2t9g144) FPGA. The simulation is
Hidden layers carried out using Xilinx ISE and ModelSim software. There are two hidden
Performance parameters layers in which (2×1) multiplexer blocks are utilized for processing twenty
Spartan-6 FPGA neurons into the output of ten neurons in the first hidden layer and
demultiplexers (1×2) and vice versa. The hardware utilization is estimated
on FPGA to compute the performance of the deep neural hardware chip
based on memory, flip flops, delay, and frequency parameters. The design is
scalable and applicable to various FPGA devices, which makes the work
novel. FPGA-based neuromorphic hardware acceleration platform with high
speed and low power for discrete spike processing on hardware with great
real-time performance.
This is an open access article under the CC BY-SA license.

Corresponding Author:
Adesh Kumar
Department of Electrical and Electronics Engineering, School of Advanced Engineering, University of
Petroleum and Energy Studies (UPES)
Dehradun, India
Email: [email protected]

1. INTRODUCTION
Deep neural network (DNN) has a lot of popularity due to which is used for most of the
applications. Like detection applications, specific hardware design of multilayer neural networks is required
when a general computer is not sufficient as it has limitations in speed, cost of the chip, and consumption of
energy. A deep learning model is used here which has two hidden layers. The multilayer neural network is
made up of numerous layers between the input and output layers, with all hidden levels being Fully
connected.
Deep learning neural network implementation on field programmable gate arrays (FPGAs) presents
various challenges, including storage, resource requirements, and memory utilization [1]. Deep neural
networks are made up of numerous “deep” and hidden layers, and each one of them does a particular task.
Deep learning is learning about different levels of data representations and their underlying distribution of the
data [2]. DNNs are most used in video editing applications, such as object detection and real-time speech

Journal homepage: http://ijra.iaescore.com

486  ISSN: 2722-2586

translation. DNNs will become increasingly significant in medicine, robotics, and finance for weather
forecasting and detection [3]. DNNs are now the basis for many modern artificial intelligence applications
[4]. Since the groundbreaking use of DNNs in speech recognition and image recognition, the number of
places where DNNs have skyrocketed. These deep neural networks are used in a wide range of applications,
including self-driving cars, cancer detection, and complicated gameplay [5]. In most fields, multilayer neural
networks can be done much more accurately than humans. Overall, multilayer artificial neural networks
(ANNs), particularly deep networks, proved themselves better at handling complicated tasks such as image
identification, speech recognition, language translation, and others, making them an essential component of
modern artificial intelligence systems [6]. DNNs are used in a wide variety of artificial intelligence
applications, including computer vision, speech recognition, robotics, and many more. DNNs can achieve
cutting-edge accuracy on a wide variety of artificial intelligence tasks [7]. The computational complexity of
these networks is rather high. Better performance, higher bandwidth, and deterministic low latency are
achieved with the usage of FPGA and HDL in the design of programmable processors. Parallel processing
and operations [8] are the foundation of the architectures of ANN to process neuronal information, regular
chips are unable to sustain a huge number of highly simultaneous processes [9]. Several distinct processors
that enable parallel processing are included in the artificial intelligence-based hardware chip.
The problem statement of the work is to design the deep neural network hardware chip and estimate
its performance on FPGA. The chip is designed for 20 neuron input and each neuron input is of 8 bits. Each
20-neuron input is multiplied by 20 input weights and each weight is 8-bit so when these 20 input weights are
multiplied by 20 neuron inputs in the multiplier it gives a 16-bit output. The multiple demultiplexer (1×2) is
used which again converts 10-neuron output into 20-neuron output which is given to the second hidden layer.
A control logic is used in this neuromorphic hardware chip design which is used to feed multiplier output to
each input of the hidden layer. The hidden layer output is then given to the multiplexer which has 20 inputs
and one single output. The mux has a 5-bit selection line which can be used to select any of the 20 outputs of
the control logic. Hardware utilization characteristics such as slices used, look-up tables (LUTs), flip-flops
(FFs), input/output blocks (IOBs), memory, and total propagation latency are used to calculate the deep
neural hardware chip’s performance.

2. RELATED WORK
Although software can easily create multilayer neural network models on computers, there are
numerous challenges when designing deep neural networks on silicon. These challenges include
implementing the multiplication of input with weights at each layer, designing the storing network weights,
and activation function, and selecting an appropriate number format for calculations. To achieve performance
efficiency, each layer in a deep ANN architecture, which will be the focus of this work, necessitates its own
independent space for memory and its weight. In a deep network that has two or more hidden layers, the
memory requirement for the weight matrices of the entire network increases at the fastest as its network’s
size and memory accesses grow. Furthermore, computations within each layer should be executed in parallel
for applications requiring high speed. These characteristics have a substantial influence on the efficiency and
performance of DNNs. In this context, the FPGA is viable for implementing a DNN accelerator. The
designing of deep ANN and the use of FPGA is very prevalent in research areas [10], [11]. convolution
neural networks (CNNs) have been used for real-time embedded systems, exploring the impartial
classification of cutting-edge system-on-chips [12]. Multi-dimensional trade-offs are investigated to
configure programmable accelerators for neural interpretation using the finest software libraries. The
framework is open source to promote fair benchmarking assessments. It has become increasingly common to
design multilayer perceptron neural networks on reconfigurable hardware to carry out a wide range of
applications, including pattern recognition and classification. The purpose of the research by Huynh [13] was
to investigate the combined impact of neural network size and decreased precision number formats, which
are utilized for the representation of the ideal parameters, on the recognition rate of a handwritten digit
recognition system that is based on neural networks. ANN architecture [14] was evaluated on Virtex-5 FPGA
for pattern recognition and tested for scalability and hardware resource use. Deep learning is now the most
advanced solution for almost all significant machine-learning tasks in a wide range of domains [15], [16].
Deep learning techniques outperform typical machine learning algorithms that require human feature
extraction (handcrafted features) [17]. Deep learning models extract hierarchical features and perform better
as the amount of data increases [18]. There are various deep learning methods and architectures applied in the
design, such as multi-layer perceptron (MLP), deep belief network (DBN), autoencoder (AE), CNNs,
recurrent neural network (RNN) with long short-term memory (LSTM) and gated recurrent units (GRU),
generative adversarial network (GAN), deep reinforcement learning (DRL) [19]. These models have spanned
a wide range of fields and applications. Deep learning architectures have enabled autonomous driving (AD)

IAES Int J Rob & Autom, Vol. 13, No. 4, December 2024: 485-494
IAES Int J Rob & Autom ISSN: 2722-2586  487

systems [20] to acquire a high level of precision in their performance; these systems are comprised of
numerous perception-level tasks. Traditional supervised learning methods are no longer applicable to the
many tasks that make up autonomous driving systems, except for perception. In computer vision
applications, CNN models have demonstrated encouraging outcomes in areas including robotics, autonomous
vehicle vision systems, and drone navigation [21]. In healthcare, CNNs have also outperformed other
methods, especially for image identification.
It has been used to detect tumors or other types of lesions by more experienced radiologists [22].
Brain images have been used for humans for magnetic resonance imaging (MRI) that detects disease
Alzheimer’s with the help of CNN [23]. Deep learning methodologies [24] have proved effective in
addressing artificial intelligence difficulties, while FPGAs have been used to improve reconfigurable
computer hardware and software for artificial intelligence design. Multilayer Networks have been
designed using FPGA [25]. Graphic processing units (GPUs) are unsuitable for embedded applications due
to their high-power consumption. Many academics have explored techniques to create deep neural
networks on low-power hardware [26]. The most expensive operations in deep convolutional neural
networks (DCNNs) are generated by deep CNNs, which are made up of numerous layers of three -
dimensional convolutions, each of which has between tens and hundreds of filters. The restriction of
flowing data in and out of the hardware chip accelerator, which is synthesized on Xilinx Zynq -7000, is a
challenge for systems that run DCNNs. These systems are required to pass 3D input maps to the hardware
accelerators to perform convolutions [27]. An implementation of High-Level Synthesis was employed on a
Virtex-6 FPGA board, and CNN was utilized [28] to analyze the design flow. The FPGA resources can be
decreased by up to 13 times when compared to accelerators that use traditional scratchpad memories,
while yet maintaining the same level of performance. A fixed point DNN is designed on FPGA and uses
any external memory. Its power consumption and time of execution are compared with GPU-based
implementation [29]. To construct feed-forward deep neural networks in real-time, GPUs are commonly
used due to the high amount of arithmetic and memory operations required [30]. The implementation of an
optimized deep learning architecture was carried out on a parallel computing platform. This architecture
featured flexible layer topologies and fast matrix operation kernels. This is the design to support minimum
hardware utilization [31]. The integration of several types of deep architectures into a unified neural
training-testing system is accomplished by the implementation of layer-wise procedures that have been
meticulously designed. A neural network hardware model can be designed on FPGA. This neural model
contains more types of layers, connections, and data [32]. Retraining via backpropagation was created as
an alternative to direct weight quantization [33]. Feedforward DNNs with several hidden layers function
well in many requests, but they require complex hardware. The system hardware complexity can be
minimized by reducing the weights and signal word length. Quantized weights and fixed -point signals are
used in backpropagation for retraining and output computation. However, precision numbers are used to
change networks [34]. The developed networks typically used 2 to 8 bits for weights and more than 7 bits
to represent signals in analog or high-precision fixed points [35]. Limited work has been done in the
domain of hardware chip design and synthesis on the specific hardware specifically targeting any FPGA
hardware and deep neural network processing. The problem statement of the presented work is to design
the scalable hardware chip of a deep neural network and evaluate the system performance in terms of
delay, frequency, LUTs, and memory.

3. METHODS
The system design includes an input layer, two hidden layers, and only one output layer. The design
of the deep neural hardware chip consists of hidden layers which are more than one. The two hidden layers
are used, and multipliers are used for multiplying the neuron inputs with the neuron’s weight, apart from the
(2×1) multiplexer component, the demultiplexer (1×2) component and one (8×1) multiplexer component are
used. Figure 1 presents the basic building blocks for the multilayer Feed-forward artificial neural network
(MFFANN). Figure 2 represents the design of a multilayer neural network. It consists of one input, two
hidden layers, and one output layer. The design of this multilayer ANN is done in VHSIC Hardware
Description Language (VHDL) using an FPGA Spartran-6 device. The main parts of the system design are as
follows.
− Multiplier: The input layer is implemented using a multiplier that multiplies (0-19) neuron input with 20
neuron weights and each input consists of 8-bit neuron input and gives a single 16-bit output after
multiplication. X0, X1……….X19 as the inputs of neurons, and W0, W1……. W19 as the weights by
multiplying this in the multiplier we get the output Y 0, Y1……. Y19, which is 16-bit.
− Control Logic: The control logic is used for the hidden layer. The output of the multiplier, which is Y 0,
Y1, Y2……. Y19 is given to the control logic the input of the control logic is taken as M 0, M1,

Design and implementation of deep neural network hardware chip and its performance ... (Aruna Pant)
488  ISSN: 2722-2586

M2…….M19 each is of 16-bit input and N0, N1, N2……. N19 is the output of the control logic which is
also 16-bit. After this, the 2×1 multiplexer is used.
− Multiplexer (2×1): There are 20 outputs of the control logic each pair of two outputs is given to a (2×1)
multiplexer, so the 20 outputs of the control logic now become 10 outputs.
− Demultiplexer (1×2): The 10 outputs of the (2×1) multiplexer are again given to the (1×2)
demultiplexer which now becomes 20 outputs, and the output of the (1×2) demultiplexer is again given
to control logic which is the second hidden layer. The 20 outputs of the control logic are given to the 20
to1 demultiplexer so we get the single output of 16 bits from this.

Figure 1. Multilayer feed-forward artificial neural network (MFFANN)

Figure 2. Block diagram representation of MFFANN

Adders and multipliers are important components that are utilized in ANNs to carry out the crucial
mathematical operations that are required for processing and learning. This is a comprehensive look at the
roles that they play [35]. In an ANN each neuron gets several inputs, each of which has a weight associated
with it. The net input to the neuron is formed by adding up all these contributions to the neuron. Each input
that is received by a neuron is multiplied by the weight that corresponds to that input. To alter the influence
that each input has on the output of the neuron is an extremely important step. His multiplication takes place
for every input that the neuron receives. Multipliers are utilized in the process of calculating gradients during
the training of ANNs that are trained using backpropagation. The process of multiplying derivatives is
performed to calculate the error gradient regarding the weights and inputs. It is essential for hardware

IAES Int J Rob & Autom, Vol. 13, No. 4, December 2024: 485-494
IAES Int J Rob & Autom ISSN: 2722-2586  489

accelerators such as FPGAs and application-specific integrated circuits (ASIC) that are used for neural
networks to have adders and multipliers that are implemented properly and efficiently. ANNs can have their
speed greatly increased and their power consumption dramatically decreased if they are designed with
optimized adders and multipliers, such as carry-lookahead adders and booth multipliers, respectively. The
concept of parallelism is utilized in hardware implementations. The inference and training processes can be
sped up by designing multipliers and adders to run in parallel [36]. This allows for the simultaneous
computation of many neuron outputs, which speeds up the process. The characteristics of a neural network
refer to the variables and qualities of the dataset [37]. To enhance the accuracy of your model, it is customary
to use a subset of variables. Therefore, the attributes of a neural network are in the input layer, rather than the
nodes in the hidden layer. In ANNs, the basic activities of summing weighted inputs and computing gradients
are performed by adders and multipliers, which are essential components for conducting these operations
[38]. The performance and energy efficiency of neural network hardware accelerators are directly correlated
to the efficiency with which they are implemented.

4. RESULTS AND DISCUSSION

The design structure of a multilayer feed-forward neural network hardware chip consists of the
multiplier, control logic, multiplexers (2×1), demultiplexers (1×2), and another multiplexer (20×1)
components. Figure 3 shows the RTL representation of a multilayer neural network.

Figure 3. RTL of multilayer ANN

Figure 4 presents the simulation inputs verified in the simulation and Figure 5 presents the
simulation waveform outputs of a multilayer neural network simulated in the design. Figure 6 presents the
multilayer hardware chip utilization chart, and Figure 7 presents the multilayer hardware chip utilization pie
chart. The usage of RTL is to present the complete details of the input and output pins in the design. The
simulation input and output pins with stimuli inputs are shown in the waveforms in the form of binary data.

Design and implementation of deep neural network hardware chip and its performance ... (Aruna Pant)
490  ISSN: 2722-2586

Figure 4. Simulation waveform of inputs for multilayer ANN

Neuron_X0_input (7:0) presents the 8-bit input of the neuron X0 and Neuron_Weight_W0_input
(7:0) presents the corresponding weight input. Neuron_X 1_input (7:0) presents the 8-bit input of the neuron
X1 and Neuron_Weight_W1_input (7:0) presents the corresponding weight input. Neuron_X 2_input (7:0)
presents the 8-bit input of the neuron X2 and Neuron_Weight_W2_input (7:0) presents the corresponding
weight input. Neuron_X3_input (7:0) presents the 8-bit input of the neuron X3 and Neuron_Weight_W3_input
(7:0) presents the corresponding weight input. Neuron_X 4_input (7:0) presents the 8-bit input of the neuron
X4 and Neuron_Weight_W4_input (7:0) presents the corresponding weight input. Neuron_X 5_input (7:0)
presents the 8-bit input of the neuron X5 and Neuron_Weight_W5_input (7:0) presents the corresponding

IAES Int J Rob & Autom, Vol. 13, No. 4, December 2024: 485-494
IAES Int J Rob & Autom ISSN: 2722-2586  491

weight input. Neuron_X6_input (7:0) presents the 8-bit input of the neuron X6 and Neuron_Weight_W6_input
(7:0) presents the corresponding weight input. Neuron_X 7_input (7:0) presents the 8-bit input of the neuron
X7 and Neuron_Weight_W7_input (7:0) presents the corresponding weight input. Neuron_X 8_input (7:0)
presents the 8-bit input of the neuron X8 and Neuron_Weight_W8_input (7:0) presents the corresponding
weight input. Neuron_X9_input (7:0) presents the 8-bit input of the neuron X9 and Neuron_Weight_W9_input
(7:0) presents the corresponding weight input. Neuron_X 10_input (7:0) presents the 8-bit input of the neuron
X10 and Neuron_Weight_W10_input (7:0) presents the corresponding weight input. Neuron_X11_input (7:0)
presents the 8-bit input of the neuron X11 and Neuron_Weight_W11_input (7:0) presents the corresponding
weight input. Neuron_X12_input (7:0) presents the 8-bit input of the neuron X12 and
Neuron_Weight_W12_input (7:0) presents the corresponding weight input. Neuron_X 13_input (7:0) presents
the 8-bit input of the neuron X13 and Neuron_Weight_W13_input (7:0) presents the corresponding weight
input. Neuron_X14_input (7:0) presents the 8-bit input of the neuron X14 and Neuron_Weight_W14_input
(7:0) presents the corresponding weight input. Neuron_X 15_input (7:0) presents the 8-bit input of the neuron
X15 and Neuron_Weight_W15_input (7:0) presents the corresponding weight input. Neuron_X 16_input (7:0)
presents the 8-bit input of the neuron X16 and Neuron_Weight_W16_input (7:0) presents the corresponding
weight input. Neuron_X17_input (7:0) presents the 8-bit input of the neuron X17 and
Neuron_Weight_W17_input (7:0) presents the corresponding weight input. Neuron_X 18_input (7:0) presents
the 8-bit input of the neuron X18 and Neuron_Weight_W18_input (7:0) presents the corresponding weight
input. Neuron_X19_input (7:0) presents the 8-bit input of the neuron X19 and Neuron_Weight_W19_input
(7:0) presents the corresponding weight input. Neuron_mid_layer_source_address (4:0) presents the address
of the mid-layer blocks to support the design. Neuron_mid_layer_destination_address (4:0) presents the
destination address of the mid layers to support the design. Mux21_selector is used to select the specific
multiplexer of the block in input. Demultiplexer12_selector is used to select the specific demultiplexer of the
block in output. Port_address (4:0) is the address to support the memory blocks. The complete output of the
entire module is taken by neuron_output (15:0).

Figure 5. Simulation waveform outputs of multilayer ANN

Design and implementation of deep neural network hardware chip and its performance ... (Aruna Pant)
492  ISSN: 2722-2586

Hardware parameters utilized

1200

Parameters utilized
960
1000
800
600
353 352
400
200
0
Slice registers Number of IOBs Used LUT-FFs (flip-flop)
Size (Multilyer ANN neuron input -20)

Figure 6. Multilayer hardware chip utilization chart

FPGA Hardware Utilization

20% 16,33%
Slice registers
LUT FFs
100% 51,96% IOBs
DSP Blocks

Figure 7. Multilayer hardware chip utilization pie chart

Table 1 lists the hardware parameters utilized by the multilayer hardware chip and the device
utilization report for this neuromorphic hardware chip is taken from Xilinx software. The number of slices
used is 960 in the multilayer hardware chip, which is out of a total of 4800, Moreover, the number of slices
LUTs used is 1547 out of a total of 2400. Input-outputs used are 53 out of a total of 102. The number of DSP
elements used is 8 out of a total of 8. The total delay used in multilayer hardware chips is 11.688ns. The
memory utilized for the multilayer hardware chip is 4667944 kilobytes. The target device used is spartan-6
FPGA with the device (XC6TS4-2TQG144) for simulation and synthesis. The Spartan-6 family offers the
most advanced capabilities for system integration at the lowest possible overall cost for applications that
require high volume capacity. It is developed using an established low-power copper process technique that
operates at 45 nanometers. DNNs require high memory-to-processing unit data transmission rates. Memory
bandwidth can limit large-scale models with many layers and parameters. When trained and run on non-
optimal hardware, DNNs can use a lot of power. This affects data mobile, centers, and embedded devices
since energy efficiency is crucial. When working with larger models or more complex DNN tasks, scalability
requires efficient parallelization across numerous processing units. Due to deep neural networks’ complexity
and resource requirements, hardware implementation has become harder.

Table 1. Hardware parameters utilized by multilayer hardware chip

Parameter Value
ANN Chip size ANN-20
Multiplier used 20
Target device (FPGA) Spartan-6(xc6stx4-2t9g144)
Used LUTs FFPs 352/2155
Slices 960/4800
Delay (ns) 6.359
IOBs 53/102
Memory (KB) 4665304

IAES Int J Rob & Autom, Vol. 13, No. 4, December 2024: 485-494
IAES Int J Rob & Autom ISSN: 2722-2586  493

5. CONCLUSION
The model of neural networks is called a deep neural network when it uses two or more hidden
layers. DNN chips based on FPGAs can be optimized for particular neural network models or tasks, which
could lead to better performance than generic alternatives. It is distinguished by the utilization of artificial
neural networks with various other layers of processing units, sometimes known as neurons or nodes, to learn
representations of input. These neural networks may automatically identify patterns or features in data
without the need for explicit programming. The multilayer hardware chip for 20-neuron input is designed in
Spartan-6 (XC6SLX4-2TQG144) FPGA Xilinx ISE 14.7 successfully. The RTL, internal schematic, and
simulation of the multilayer hardware are verified. The hardware parameter utilization reports show that
LUT’s flip-flop used is 352 out of 2155 which is 16.33 %. The number of bonded IOBs used is 53 out of the
available total 102 IOBs, which is 51.96 %. The number of DSP elements used is 8 total out of 8 which is
100.00 %, Number of slice registers used is 960 of the total available 4800 which shows that it is 20.00%
utilized. The multilayer ANN design supported 313 MHz frequency, delay 6.359 ns, memory 4665304 KB,
and optimal power estimation of 11.45 nW. A new, more efficient dual-register 6-input LUTs logic and a
large range of built-in system-level blocks are both included in the Spartan-6 family of integrated circuits,
which offers the best mix of cost, power, and performance.

ACKNOWLEDGEMENTS
Thanks to the UPES VLSI design and DSP Lab to provide the platform for carrying out the work.

REFERENCES
[1] M. Wielgosz and M. Karwatowski, “Mapping neural networks to FPGA-based IoT devices for ultra-low latency processing,”
Sensors, vol. 19, no. 13, Jul. 2019, doi: 10.3390/s19132981.
[2] A. Kumar, P. Sharma, M. K. Gupta, and R. Kumar, “Machine learning based resource utilization and pre-estimation for network
on chip (NoC) communication,” Wireless Personal Communications, vol. 102, no. 3, pp. 2211–2231, Oct. 2018, doi:
10.1007/s11277-018-5376-3.
[3] W. Haensch, T. Gokmen, and R. Puri, “The next generation of deep learning hardware: analog computing,” in Proceedings of the
IEEE, Jan. 2019, vol. 107, no. 1, pp. 108–122, doi: 10.1109/JPROC.2018.2871057.
[4] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015, doi:
10.1038/nature14539.
[5] L. Deng et al., “Recent advances in deep learning for speech research at Microsoft,” in 2013 IEEE International Conference on
Acoustics, Speech and Signal Processing, May 2013, pp. 8604–8608, doi: 10.1109/ICASSP.2013.6639345.
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,”
Communications of the ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
[7] D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489,
2016, doi: 10.1038/nature16961.
[8] D. Wang, L. Deng, P. Tang, C. Ma, and J. Pei, “FPGA-based neuromorphic computing system with a scalable routing network,”
in 2015 15th Non-Volatile Memory Technology Symposium (NVMTS), Oct. 2015, pp. 1–4, doi: 10.1109/NVMTS.2015.7457432.
[9] M. A. Dias and D. A. P. Ferreira, “Deep learning in reconfigurable hardware: a survey,” in 2019 IEEE International Parallel and
Distributed Processing Symposium Workshops (IPDPSW), May 2019, pp. 95–98, doi: 10.1109/IPDPSW.2019.00026.
[10] N. Nedjah, R. M. da Silva, and L. de Macedo Mourelle, “Compact yet efficient hardware implementation of artificial neural
networks with customized topology,” Expert Systems with Applications, vol. 39, no. 10, pp. 9191–9206, Aug. 2012, doi:
10.1016/j.eswa.2012.02.085.
[11] J. Park and W. Sung, “FPGA based implementation of deep neural networks using on-chip memory only,” in 2016 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2016, pp. 1011–1015, doi: 10.1109/ICASSP.2016.7471828.
[12] M. Verucchi et al., “A systematic assessment of embedded neural networks for object detection,” in 2020 25th IEEE International
Conference on Emerging Technologies and Factory Automation (ETFA), Sep. 2020, pp. 937–944, doi: 10.1109/ETFA46521.2020.9212130.
[13] T. V. Huynh, “Design space exploration for a single-FPGA handwritten digit recognition system,” in 2014 IEEE Fifth
International Conference on Communications and Electronics (ICCE), Jul. 2014, pp. 291–296, doi: 10.1109/CCE.2014.6916717.
[14] T. V. Huynh, “Evaluation of artificial neural network architectures for pattern recognition on FPGA,” International Journal of
Computing and Digital Systems, vol. 6, no. 3, pp. 133–138, 2017, doi: 10.12785/IJCDS/060305.
[15] A. Kumar, G. Verma, M. K. Gupta, M. Salauddin, B. K. Rehman, and D. Kumar, “3D multilayer mesh NoC communication and
FPGA synthesis,” Wireless Personal Communications, vol. 106, no. 4, pp. 1855–1873, Jun. 2019, doi: 10.1007/s11277-018-5724-3.
[16] A. Owens, P. Isola, J. McDermott, A. Torralba, E. H. Adelson, and W. T. Freeman, “Visually indicated sounds,” in 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2405–2413, doi: 10.1109/CVPR.2016.264.
[17] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep learning for smart manufacturing: methods and applications,” Journal of
Manufacturing Systems, vol. 48, pp. 144–156, Jul. 2018, doi: 10.1016/j.jmsy.2018.01.003.
[18] A. Goel, A. K. Goel, and A. Kumar, “The role of artificial neural network and machine learning in utilizing spatial information,”
Spatial Information Research, vol. 31, no. 3, pp. 275–285, 2023, doi: 10.1007/s41324-022-00494-x.
[19] P. P. Groumpos, “Deep learning vs. wise learning: A critical and challenging overview,” IFAC-PapersOnLine, vol. 49, no. 29, pp.
180–189, 2016, doi: 10.1016/j.ifacol.2016.11.099.
[20] B. R. Kiran et al., “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent
Transportation Systems, vol. 23, no. 6, pp. 4909–4926, Jun. 2022, doi: 10.1109/TITS.2021.3054625.
[21] A. Giusti et al., “A machine learning approach to visual perception of forest trails for mobile robots,” IEEE Robotics and
Automation Letters, vol. 1, no. 2, pp. 661–667, Jul. 2016, doi: 10.1109/LRA.2015.2509024.
[22] R. J. Chen et al., “Dermatologist-level classification of skin cancer with deep neural networks,” Nature, vol. 10, no. 3, pp. 1–8,
2018.

Design and implementation of deep neural network hardware chip and its performance ... (Aruna Pant)
494  ISSN: 2722-2586

[23] B. Khagi, C. G. Lee, and G.-R. Kwon, “Alzheimer’s disease classification from brain MRI based on transfer learning from CNN,”
in 2018 11th Biomedical Engineering International Conference (BMEiCON), Nov. 2018, pp. 1–4, doi:
10.1109/BMEiCON.2018.8609974.
[24] T. Belabed, M. G. F. Coutinho, M. A. C. Fernandes, C. V. Sakuyama, and C. Souani, “User driven FPGA-based design automated
framework of deep neural networks for low-power low-cost edge computing,” IEEE Access, vol. 9, pp. 89162–89180, 2021, doi:
10.1109/ACCESS.2021.3090196.
[25] S. Hariprasath and T. N. Prabakar, “FPGA implementation of multilayer feed forward neural network architecture using VHDL,”
2012 International Conference on Computing, Communication and Applications, ICCCA 2012, 2012, doi:
10.1109/ICCCA.2012.6179225.
[26] Y. Zhou, S. Redkar, and X. Huang, “Deep learning binary neural network on an FPGA,” in 2017 IEEE 60th International
Midwest Symposium on Circuits and Systems (MWSCAS), Aug. 2017, pp. 281–284, doi: 10.1109/MWSCAS.2017.8052915.
[27] A. Dundar, J. Jin, V. Gokhale, B. Martini, and E. Culurciello, “Memory access optimized routing scheme for deep networks on a
mobile coprocessor,” in 2014 IEEE High Performance Extreme Computing Conference (HPEC), Sep. 2014, pp. 1–6, doi:
10.1109/HPEC.2014.7040963.
[28] M. Peemen, A. A. A. Setio, B. Mesman, and H. Corporaal, “Memory-centric accelerator design for Convolutional Neural Networks,”
in 2013 IEEE 31st International Conference on Computer Design (ICCD), Oct. 2013, pp. 13–19, doi: 10.1109/ICCD.2013.6657019.
[29] Y. Y. Ghadi et al., “Machine learning solutions for the security of wireless sensor networks: a review,” IEEE Access, vol. 12, pp.
12699–12719, 2024, doi: 10.1109/ACCESS.2024.3355312.
[30] Z. Chen, J. Wang, H. He, and X. Huang, “A fast deep learning system using GPU,” in 2014 IEEE International Symposium on
Circuits and Systems (ISCAS), Jun. 2014, pp. 1552–1555, doi: 10.1109/ISCAS.2014.6865444.
[31] Y. Zhang and S. Zhang, “Optimized deep learning architectures with fast matrix operation kernels on parallel platform,” in 2013
IEEE 25th International Conference on Tools with Artificial Intelligence, Nov. 2013, pp. 71–78, doi: 10.1109/ICTAI.2013.21.
[32] R. Zhao et al., “Hardware compilation of deep neural networks: an overview,” in 2018 IEEE 29th International Conference on
Application-specific Systems, Architectures and Processors (ASAP), Jul. 2018, pp. 1–8, doi: 10.1109/ASAP.2018.8445088.
[33] J. Kim, K. Hwang, and W. Sung, “X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks,” in
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2014, pp. 7510–7514, doi:
10.1109/ICASSP.2014.6855060.
[34] K. Hwang and W. Sung, “Fixed-point feedforward deep neural network design using weights +1, 0, and −1,” in 2014 IEEE
Workshop on Signal Processing Systems (SiPS), Oct. 2014, pp. 1–6, doi: 10.1109/SiPS.2014.6986082.
[35] S. Anwar, K. Hwang, and W. Sung, “Fixed point optimization of deep convolutional neural networks for object recognition,” in
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 2015, pp. 1131–1135, doi:
10.1109/ICASSP.2015.7178146.
[36] A. Goel, A. K. Goel, and A. Kumar, “Performance analysis of multiple input single layer neural network hardware chip,”
Multimedia Tools and Applications, vol. 82, no. 18, pp. 28213–28234, Jul. 2023, doi: 10.1007/s11042-023-14627-3.
[37] A. Devrari and A. Kumar, “Turbo encoder and decoder chip design and FPGA device analysis for communication system,”
International Journal of Reconfigurable and Embedded Systems (IJRES), vol. 12, no. 2, pp. 174–185, Jul. 2023, doi:
10.11591/ijres.v12.i2.pp174-185.
[38] A. S. Rawat, A. Rana, A. Kumar, and A. Bagwari, “Application of multi layer artificial neural network in the diagnosis system: a
systematic review,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 7, no. 3, pp. 138–142, Aug. 2018, doi:
10.11591/ijai.v7.i3.pp138-142.

BIOGRAPHIES OF AUTHORS

Aruna Pant is working as a junior research fellow (JRF) in the Department of

Electrical and Electronics Engineering, UPES, Dehradun, India. She has an M.Tech. in
Electronics and Communication Engineering from Amity University, Lucknow, India. Her
area of interest is biomedical signal processing, EEG signals, and digital signal processing.
She is having an experience of 12 years of teaching. She can be contacted at
[email protected].

Adesh Kumar has been working as a professor in the Department of Electrical

and Electronics Engineering, School of Advanced Engineering, The University of Petroleum
and Energy Studies (UPES), Dehradun, India since 2010. He has a B.Tech. in Electronics and
Communication Engineering from UPTU, Lucknow India in 2006. M.Tech. (Hons) in
Embedded Systems Technology, from SRM University, Chennai in 2008. Ph.D. (Electronics
Engineering) from UPES, Dehradun India in 2014. He has also worked as a Senior Engineer in
TATA ELXSI LIMITED Bangalore and as a faculty member in ICFAI University, Dehradun.
His areas of interest are VLSI design, embedded systems design, telecommunications, and
signal processing. He has supervised 8 Ph.D. scholars, and 5 candidates are doing research
under his supervision. He has worked on more than 10 editorial assignments for edited books,
conference proceedings, and journals, having citations of more than 2000, H-index-25 i10-
index-45. He has 16 years of experience in the teaching+ research industry, published more
than 120+ research papers in international peer-reviewed journals (SCI/Scopus) and
conferences. He can be contacted at [email protected] or [email protected].

IAES Int J Rob & Autom, Vol. 13, No. 4, December 2024: 485-494

LESSON 1: The Project Management and Information Technology Context
No ratings yet
LESSON 1: The Project Management and Information Technology Context
3 pages
Spooky2 Morgellon Lyme Guide 9.2019
No ratings yet
Spooky2 Morgellon Lyme Guide 9.2019
87 pages
Development of Arduino-Based High Heat Detector Temperature Control Prototype For Household Appliances
No ratings yet
Development of Arduino-Based High Heat Detector Temperature Control Prototype For Household Appliances
20 pages
Lms - Uaf.edu - PK Course Uaf Student Result - PHP PDF
100% (1)
Lms - Uaf.edu - PK Course Uaf Student Result - PHP PDF
4 pages
Design of Fpga Based General Purpose Neural Network: MR Prashant D.Deotaleproflalit Dole
No ratings yet
Design of Fpga Based General Purpose Neural Network: MR Prashant D.Deotaleproflalit Dole
5 pages
Advanced Intelligent Systems - 2024 - Song - Hardware For Deep Learning Acceleration
No ratings yet
Advanced Intelligent Systems - 2024 - Song - Hardware For Deep Learning Acceleration
20 pages
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
No ratings yet
A High-Performance Hardware Accelerator For Sparse Convolutional Neural Network On FPGA
7 pages
A Dynamically Configurable Coprocessor For Convolutional Neural Networks
No ratings yet
A Dynamically Configurable Coprocessor For Convolutional Neural Networks
11 pages
UNIT-1 Foundations of Deep Learning
100% (1)
UNIT-1 Foundations of Deep Learning
51 pages
2021 Dcis Hdf5tovhdl
No ratings yet
2021 Dcis Hdf5tovhdl
5 pages
Competition On The Medical Robotics Markets For Elderly
100% (1)
Competition On The Medical Robotics Markets For Elderly
10 pages
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
No ratings yet
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
148 pages
Spiking Neiral Network
No ratings yet
Spiking Neiral Network
15 pages
Ann On Fpga
No ratings yet
Ann On Fpga
5 pages
Fpga Implementation of A Multilayer Perceptron Neural Network Using VHDL
No ratings yet
Fpga Implementation of A Multilayer Perceptron Neural Network Using VHDL
4 pages
Benchmarking Contemporary Deep Learning Hardware and Frameworks A Survey of Qualitative Metrics
No ratings yet
Benchmarking Contemporary Deep Learning Hardware and Frameworks A Survey of Qualitative Metrics
8 pages
FFSN Inplementation4
No ratings yet
FFSN Inplementation4
18 pages
Prepared by G.V.Shivakkumar Sap-Mm/Wm
No ratings yet
Prepared by G.V.Shivakkumar Sap-Mm/Wm
11 pages
FPGA - Based Accelerators of Deep LearningNetworks For Learning and Classification
100% (1)
FPGA - Based Accelerators of Deep LearningNetworks For Learning and Classification
37 pages
Mri Report
No ratings yet
Mri Report
49 pages
Development and Implementation of Parameterized FPGA-Based General Purpose Neural Networks For Online Applications
No ratings yet
Development and Implementation of Parameterized FPGA-Based General Purpose Neural Networks For Online Applications
12 pages
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
No ratings yet
Acceleration and Optimization of Artificial Intelligence CNN Image Recognition Based On F
5 pages
Arts Vi Long Quiz Fourth Periodical 2017-2018 With Answer Key
No ratings yet
Arts Vi Long Quiz Fourth Periodical 2017-2018 With Answer Key
5 pages
Computer Applications Sample Papers Knowledge Boat
No ratings yet
Computer Applications Sample Papers Knowledge Boat
76 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
41 pages
DaDianNao A Machine-Learning Supercomputer
No ratings yet
DaDianNao A Machine-Learning Supercomputer
14 pages
2020 Hong
No ratings yet
2020 Hong
18 pages
CNN hw1
No ratings yet
CNN hw1
13 pages
Integrated Photonic Tensor Processing Unit For A M
No ratings yet
Integrated Photonic Tensor Processing Unit For A M
14 pages
SNN HW Challenges
No ratings yet
SNN HW Challenges
35 pages
Article
No ratings yet
Article
10 pages
1 s2.0 S0141933124000322 Main
No ratings yet
1 s2.0 S0141933124000322 Main
7 pages
FP-DNN An Automated Framework For Mapping
No ratings yet
FP-DNN An Automated Framework For Mapping
8 pages
Vector-Logic Computing For Faults-As-Address Deductive Simulation
No ratings yet
Vector-Logic Computing For Faults-As-Address Deductive Simulation
15 pages
Make 04 00004 v3
No ratings yet
Make 04 00004 v3
37 pages
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
No ratings yet
An Empirical Approach To Enhance Performance For Scalable CORDIC-Based Deep Neural Networks
32 pages
Analysis of Single Layer Artificial Neural Network Neuromorphic Hardware Chip
No ratings yet
Analysis of Single Layer Artificial Neural Network Neuromorphic Hardware Chip
11 pages
2020 02 Ali CNN
No ratings yet
2020 02 Ali CNN
5 pages
Dlau: A Scalable Deep Learning Accelerator Unit On Fpga N.Vimala, B.Alekya Himabindu, Y.Mallikarjuna Rao, G.Sowmya, S.Girish Babu, M.Mahesh Kumar
No ratings yet
Dlau: A Scalable Deep Learning Accelerator Unit On Fpga N.Vimala, B.Alekya Himabindu, Y.Mallikarjuna Rao, G.Sowmya, S.Girish Babu, M.Mahesh Kumar
15 pages
24CH10039 AGV Task 4
No ratings yet
24CH10039 AGV Task 4
3 pages
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
No ratings yet
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
36 pages
Hardware For Deep Learning Acceleration
No ratings yet
Hardware For Deep Learning Acceleration
20 pages
Systematic Analysis of FPGA-based Hardware Acceler
No ratings yet
Systematic Analysis of FPGA-based Hardware Acceler
9 pages
Fpga 11893295 - 122
No ratings yet
Fpga 11893295 - 122
2 pages
Fin Irjmets1684902949
No ratings yet
Fin Irjmets1684902949
6 pages
FPGA Based Implementation of Neural Network
No ratings yet
FPGA Based Implementation of Neural Network
5 pages
Tesla Patent
No ratings yet
Tesla Patent
12 pages
009-2014-009 APAC Best Practice Installation Manual Issue 1.1
No ratings yet
009-2014-009 APAC Best Practice Installation Manual Issue 1.1
85 pages
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
No ratings yet
2022 Review of FPGA-Based Accelerators of Deep Convolutional Neural Networks
7 pages
Capra 2020
No ratings yet
Capra 2020
48 pages
Deep Learning
No ratings yet
Deep Learning
12 pages
Futureinternet 12 00113 v2
No ratings yet
Futureinternet 12 00113 v2
22 pages
Unit 4 Notes New
No ratings yet
Unit 4 Notes New
49 pages
Designing High Power Efficient Finite Impulse Response Filters With Three-Four Inexact Adder-Integrated Booth Multiplier
No ratings yet
Designing High Power Efficient Finite Impulse Response Filters With Three-Four Inexact Adder-Integrated Booth Multiplier
10 pages
JTECCNN
No ratings yet
JTECCNN
6 pages
Applsci 12 10771 v2
No ratings yet
Applsci 12 10771 v2
44 pages
Hardware Implementation of Neural Networks
No ratings yet
Hardware Implementation of Neural Networks
5 pages
Efficient Hardware Architectures For Deep Convolutional Neural Network
No ratings yet
Efficient Hardware Architectures For Deep Convolutional Neural Network
13 pages
Fixed-Point CNN For FPGA
No ratings yet
Fixed-Point CNN For FPGA
7 pages
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
No ratings yet
Convolutional Neural Network Layers Implementation On Low-Cost Reconfigurable Edge Computing Platforms
31 pages
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
No ratings yet
Efficient Hardware Architectures For Accelerating Deep Neural Networks Survey
42 pages
Unit Ii ML
No ratings yet
Unit Ii ML
22 pages
Artificial Neural Networks (Anns) : Foundations, Architectures, and Applications
No ratings yet
Artificial Neural Networks (Anns) : Foundations, Architectures, and Applications
6 pages
Embedded Deep Learning Accelerators A Survey On Recent Advances
No ratings yet
Embedded Deep Learning Accelerators A Survey On Recent Advances
19 pages
Dice Profile Thirupathi Kelli PDF
No ratings yet
Dice Profile Thirupathi Kelli PDF
10 pages
Marketing Assigment 2011 (Final Work)
50% (4)
Marketing Assigment 2011 (Final Work)
12 pages
An Implementation of Convolutional Neural Networks
No ratings yet
An Implementation of Convolutional Neural Networks
23 pages
FPGA Based Artificial Neural Network
No ratings yet
FPGA Based Artificial Neural Network
11 pages
CURe-Mi Mobile Manipulator Robot For Contact-Less COVID-19 Patients Serving Missions
No ratings yet
CURe-Mi Mobile Manipulator Robot For Contact-Less COVID-19 Patients Serving Missions
12 pages
Robotic Inspection and Automated Analysis System For Advanced Manufacturing
No ratings yet
Robotic Inspection and Automated Analysis System For Advanced Manufacturing
13 pages
Combining Optimization and Dynamic Movement Primitives For Planning Energy Optimal Forestry Crane Motions
No ratings yet
Combining Optimization and Dynamic Movement Primitives For Planning Energy Optimal Forestry Crane Motions
14 pages
Traffic Lights Time Strategy For T-Junctions of Toll Road Gate
No ratings yet
Traffic Lights Time Strategy For T-Junctions of Toll Road Gate
10 pages
Object Distance Estimation Using A Monovision Camera
No ratings yet
Object Distance Estimation Using A Monovision Camera
7 pages
A General Neural Network Hardware Architecture On FPGA
No ratings yet
A General Neural Network Hardware Architecture On FPGA
6 pages
MCA NewProgrammeGuideCRCFinalJanuary2022
No ratings yet
MCA NewProgrammeGuideCRCFinalJanuary2022
81 pages
Unit I
No ratings yet
Unit I
10 pages
Nonlinear Kalman Filter For Gyroscopic and Accelerometer Noise Rejection of An Unmanned Aerial Vehicle Control Strategy
No ratings yet
Nonlinear Kalman Filter For Gyroscopic and Accelerometer Noise Rejection of An Unmanned Aerial Vehicle Control Strategy
11 pages
Agricultural Path Detection Systems Using Canny-Edge Detection and Hough Transform
No ratings yet
Agricultural Path Detection Systems Using Canny-Edge Detection and Hough Transform
8 pages
Reconfigurable Hardware Design Approach For Economic Neural Network
No ratings yet
Reconfigurable Hardware Design Approach For Economic Neural Network
5 pages
Autonomous Navigation System For Hexa-Legged Search and Rescue Robot Using Lidar
No ratings yet
Autonomous Navigation System For Hexa-Legged Search and Rescue Robot Using Lidar
15 pages
Hardware Design For Machine Learning
No ratings yet
Hardware Design For Machine Learning
22 pages
Transport Platform Stabilization Mechanism Using Controlled Suspension
No ratings yet
Transport Platform Stabilization Mechanism Using Controlled Suspension
8 pages
A 2D Path-Planning Performance Comparison of RRT and RRT For Unmanned Ground Vehicle
No ratings yet
A 2D Path-Planning Performance Comparison of RRT and RRT For Unmanned Ground Vehicle
8 pages
Potential Challenges of Collaborative Robot Implementation in Vietnamese Garment Manufacturing
No ratings yet
Potential Challenges of Collaborative Robot Implementation in Vietnamese Garment Manufacturing
10 pages
Distributed and Autonomous Multi-Robot For Task Allocation and Collaboration Using A Greedy Algorithm and Robot Operating System Platform
No ratings yet
Distributed and Autonomous Multi-Robot For Task Allocation and Collaboration Using A Greedy Algorithm and Robot Operating System Platform
15 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
Colin Boyd Information Security Research Centre School of Data Communications Queensland University of Technology Brisbane Q4001 Australia
No ratings yet
Colin Boyd Information Security Research Centre School of Data Communications Queensland University of Technology Brisbane Q4001 Australia
17 pages
A Novel Structure of Magnetic Geared Generator in Dual-Rotor Wind Turbine
No ratings yet
A Novel Structure of Magnetic Geared Generator in Dual-Rotor Wind Turbine
16 pages
International Refereed Journal of Engineering and Science (IRJES)
No ratings yet
International Refereed Journal of Engineering and Science (IRJES)
4 pages
Design and Development of A Quadruped Home Surveillance Robot
No ratings yet
Design and Development of A Quadruped Home Surveillance Robot
13 pages
Bipedal Robot Center of Pressure Feedback Simulation For Center of Mass Learning
No ratings yet
Bipedal Robot Center of Pressure Feedback Simulation For Center of Mass Learning
14 pages
UDT and UDF Related Issues
No ratings yet
UDT and UDF Related Issues
5 pages
1073 Operating Manual PDF
No ratings yet
1073 Operating Manual PDF
42 pages
VotTomNet: Voting-Based Tomato Disease Diagnosis With Transfer Learning
No ratings yet
VotTomNet: Voting-Based Tomato Disease Diagnosis With Transfer Learning
9 pages
Performance Comparison of Optical Flow and Background Subtraction and Discrete Wavelet Transform Methods For Moving Objects
No ratings yet
Performance Comparison of Optical Flow and Background Subtraction and Discrete Wavelet Transform Methods For Moving Objects
10 pages
The Use of Artificial Intelligence in Interrogation: Lies and Truth
No ratings yet
The Use of Artificial Intelligence in Interrogation: Lies and Truth
9 pages
Textile Management System Final Review
No ratings yet
Textile Management System Final Review
40 pages
Unit1 and Unit2
No ratings yet
Unit1 and Unit2
85 pages
Optimizing Robot Anomaly Detection Through Stochastic Differential Approximation and Brownian Motion
No ratings yet
Optimizing Robot Anomaly Detection Through Stochastic Differential Approximation and Brownian Motion
12 pages
UserGuide - SK FM4 216 ETHERNET
No ratings yet
UserGuide - SK FM4 216 ETHERNET
77 pages
User Manual For Amazfit Band 5
No ratings yet
User Manual For Amazfit Band 5
25 pages
Karvy Stock Broking Limited Mobile App User Manual
No ratings yet
Karvy Stock Broking Limited Mobile App User Manual
37 pages
Design of H-/H Based Fault Detection Filter For Linear Uncertain Systems Using Linear Matrix Inequalities
No ratings yet
Design of H-/H Based Fault Detection Filter For Linear Uncertain Systems Using Linear Matrix Inequalities
13 pages
The Limits Theorem
No ratings yet
The Limits Theorem
18 pages
Build and Accesing MySQL Database Using XAMPP
No ratings yet
Build and Accesing MySQL Database Using XAMPP
26 pages
Unit 3
No ratings yet
Unit 3
8 pages
Knowledge Management
No ratings yet
Knowledge Management
8 pages
Week1 01 Introduction
No ratings yet
Week1 01 Introduction
50 pages
Camera-Based Simultaneous Localization and Mapping: Methods, Camera Types, and Deep Learning Trends
No ratings yet
Camera-Based Simultaneous Localization and Mapping: Methods, Camera Types, and Deep Learning Trends
11 pages
NAPLAM: A Novel Ledger-Based Algorithm For Detection and Mitigation of Sinkhole Attacks in Routing Protocol For Low Power and Lossy Networks-Based Internet of Things
No ratings yet
NAPLAM: A Novel Ledger-Based Algorithm For Detection and Mitigation of Sinkhole Attacks in Routing Protocol For Low Power and Lossy Networks-Based Internet of Things
12 pages
Development and Implementation of A Mobile Robot For Grouting Floor Tile Joints
No ratings yet
Development and Implementation of A Mobile Robot For Grouting Floor Tile Joints
11 pages
Chios Victory Equasis
No ratings yet
Chios Victory Equasis
4 pages
A Novel Approach To Enhance Rice Foliar Disease Detection: Custom Data Generators, Advanced Augmentation, Hybrid Fine-Tuning, and Regularization Techniques With DenseNet121
No ratings yet
A Novel Approach To Enhance Rice Foliar Disease Detection: Custom Data Generators, Advanced Augmentation, Hybrid Fine-Tuning, and Regularization Techniques With DenseNet121
11 pages
Documents: Search Books, Presentatio
No ratings yet
Documents: Search Books, Presentatio
14 pages
Experimental Evaluation of Bidirectional Encoder Representations From Transformers Models For De-Identification of Clinical Document Images
No ratings yet
Experimental Evaluation of Bidirectional Encoder Representations From Transformers Models For De-Identification of Clinical Document Images
8 pages
Kafd A1 CJ01 P504 Gas TRN 00402
No ratings yet
Kafd A1 CJ01 P504 Gas TRN 00402
2 pages
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
No ratings yet
COPIA 2-Plantilla Con Formulas V3-1-InTRENA .
7 pages
Renosem-S130D Plasma Sterilizer System Package Price
No ratings yet
Renosem-S130D Plasma Sterilizer System Package Price
2 pages
Design and Development of Humanoid Robotic Arm
No ratings yet
Design and Development of Humanoid Robotic Arm
8 pages
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet

Design and Implementation of Deep Neural Network Hardware Chip and Its Performance Analysis

Uploaded by

Design and Implementation of Deep Neural Network Hardware Chip and Its Performance Analysis

Uploaded by

IAES International Journal of Robotics and Automation (IJRA)

Vol. 13, No. 4, December 2024, pp. 485~494

Design and implementation of deep neural network hardware

Aruna Pant, Adesh Kumar

Article Info ABSTRACT

Journal homepage: http://ijra.iaescore.com

Figure 1. Multilayer feed-forward artificial neural network (MFFANN)

Figure 2. Block diagram representation of MFFANN

4. RESULTS AND DISCUSSION

Figure 3. RTL of multilayer ANN

Figure 4. Simulation waveform of inputs for multilayer ANN

Figure 5. Simulation waveform outputs of multilayer ANN

Hardware parameters utilized

Figure 6. Multilayer hardware chip utilization chart

FPGA Hardware Utilization

Figure 7. Multilayer hardware chip utilization pie chart

Table 1. Hardware parameters utilized by multilayer hardware chip

Aruna Pant is working as a junior research fellow (JRF) in the Department of

Adesh Kumar has been working as a professor in the Department of Electrical

You might also like